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can be made with much validity. However, 
it is possible to compare the performance of 
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trial to the performance of the teams having 
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trial. On this basis one might deduce that a 
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electronic simulator would result in about a 
20% superiority over familiarization via the 
operational game. At any rate, it seems safe 
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tary savings can be anticipated, the present 
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Thus, 
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1 dealer salesman score for the PAT. Within to consistently perceive, think, and respond in a ma 
ample of 65 this score ranged from 3 to ner very different from th 
possible range is, of cour 5 to +4. The nine 


itors are described briefly below Keys 89-92 or any of the parallel special 


General dependence ( positii Any indication that iW th iverage trequency ol occurrence 


yraity Scored as 


ipport in the form of praise, attention, instruction, irrangement choices in the population as 
or assistance is a condition for activity or positive among sa personnel of the con 
affect. Scored as present a rar s obtained on Key low 


129 or if appropriate verbal statements occur on four 


7 zi : RESULTS 
or more plates. The “help” responses found to typify 
high level industria utives fall within this defi An\ 


measure which yielded a correlation 
Miner & 


with one or both of the composite indexes of 
sales performance sufficiently high so that the 
probability of occurrence by chance was less 
than .20 was subjected to cross-validation. 
This rather liberal cutting point was adopted 
because of the small size of the initial sample 
, The procedure almost guaranteed that no 

on truly reliable test-criterion relationships would 
obtained be overlooked. The following measures met 
if this test: WAIS Similarities, WAIS Arith- 
metic, Personnel Classification Test-Numeri- 
cal, Kuder Preference Record-Clerical, and 
PAT Dealer Salesman Score. 


ion is an 
character 
‘red as 


appro 


indication of a pref 


maximl g the dis- 


ions ane 
iW . 
uly OVéE 


mptations and drives 


resent if a rare 1 
red 53-4 
lication of a 
ind respond 

adopted | the 
is obtained on 


+} 
n 


covering in detail the method 
in obtaining the lealer salesman score 
PAT is available from the author 
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TABLE 3 
HIPS BETWEEN TEsT SCOR 


RELATION 


WAIS 
\rithmet 
Performance 


measures 


Motor 


oil sales 


Corrected 


CBA sales 
Uncorrected 


Corrected 


All relationships were positive except for 
those involving the Kuder Preference Record. 
In the case of the PAT score the positive find- 
ing at this point meant only that it had been 


PERFORMANCI 


possible to isolate a series of indicators which 
would significant correlation with 
sales performance. The probability that this 
had been accomplished largely as a result of 
capitalization on chance factors remained high. 


vield a 


On cross-validation two of these measures 
continued to yield reliable correlations. The 
results are contained in Table 2. The other 
three measures proved ineffective. Even had 
a one-tailed test of significance been employed 
these latter correlations would not have ap- 
proached an acceptable level. 

Table 3 contains the results obtained with 
the PAT and WAIS Arithmetic when the to- 
tal sample of 65 salesmen was employed. The 
PAT score provides a highly effective method 
of discriminating between successful and un- 

Initial results 
21 were apparently not a 


successful salesmen obtained 
with the sample of 
result of capitalizing on chance factors. Posi- 
1, +2, or +3) were attained by 


76% of those who were above average on the 


tive scores (+4 


corrected composite criterion, scores of zero 
by 21%, and 1, —2,. or 
3%. Among the below average sales- 
had positive PAT scores, 29 

52% had 


Put somewhat differently, 4 out of 5 men with 


negative scores ( 

3) by 
men 19% 
and negative scores 


SC ored zero, 


positive scores were above average salesmen 
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Only about 1 man in 20 with negative scores 
was equally successful. The correlation be- 
tween PAT score and WAIS Arithmetic in the 
total sample was .19, a figure which falls con- 
siderably below acceptable levels of signifi- 
cance, but which is nevertheless high enough 
to minimize the gain to be expected from the 
use of multiple correlation techniques. 
Having attained a relatively high level of 
predictive efficiency and thus one major ob- 
jective of the study, an attempt was made to 
define more specifically the personality charac- 
teristics associated with effective and ineffec- 
tive performance. The results of this analysis 
are presented in Table 4. Clearly dependence, 
self-confidence, and happiness go with suc- 
cessful sales performance; low aggression and 
a strong superego with less successful per- 
formance. Sociophilia and sociophobia come 
very close to attaining acceptable levels of 
significance when those possessing these char- 
acteristics are compared with all others who 
do not them. When the 
treated as extremes of a single continuum the 
difference between them reliable 
(t 2.69, p< .01), in spite of the very 
small number of cases exhibiting sociophobic 
tendencies. It assume that 
sociophilia is in fact associated with sales suc- 


possess two are 


is highly 


seems safe to 
cess and sociophobia with less 
formance. 
not emerge clearly as 


effective per- 
deviance do 
indicators. 


Overconformity and 


negative 


Trends are present and both apparently con- 
tribute to the prediction process, but the data 
are not sufficiently revealing to permit a state- 


ment to the effect that overconformists and 
deviants are likely to be poor salesmen. The 
number of cases in both is small 
and more study is clearly needed on a larger 


sample. 


instances 


| YISCUSSION 


rhe finding that an oral test which requires 
the subject to arithmetic 
(WAIS Arithmetic) predicts success 
while a written measure involving primarily 
computation (Wesman Numerical) does not, 
is of considerable interest. The results sug- 
gest that success is a function of the ability 
to work out financial management and pric- 
ing problems quickly and correctly in one’s 
head while discussing these matters with in- 


solve problems 


sales 


dividual dealers. Mere bookkeeping skill and 
knowledge of laws governing the combining 
of numbers is apparently of little value. One 
must be able to think effectively in numerical 
terms while in close proximity to, and inter- 
action with, other people. In all probability 
this skill is a requirement rather specific to 
the job of dealer salesman. Other sales oc- 
cupations may require different types of nu- 
merical ability, or perhaps no particular skill 
in the numerical area at all, although Ghiselli 
(1955) does report rather high validity co- 
efficients for arithmetic used to 
predict success in a variety of sales occupa- 
tions other than that of sales clerk. 

The PAT data, however, provide the great- 
est insight into the between the 
individual salesmen and the specific demands 


tests when 


interaction 


of their jobs. It would appear that a man can 
be a good salesman for a variety of reasons: 
dependence, sociophilia, confidence, happiness 
There is, however, the question of the nature 
of the cause-effect relationship. The person- 
result rather 
than a cause of superior performance. The 
data do not provide a clear-cut answer. It 
seems improbable, nevertheless, that depend- 
ence and sociophilia are consistently products 


ality characteristics may be a 


of sales success, although they may be under 
certain circumstances. Certainly, most people 
develop these characteristics under very dif- 
ferent Schachter, 1959). 
Confidence and happiness are 


circumstances (see 
logic al 


products, but even here there is some ques- 


more 


tion. Neither confidence nor happiness char- 
acterize all Among the 
top 10 men on the uncorrected combined 
measure (the combined measure most closely 
approximating the figures the men are aware 
of), 6 are confident and 6 happy, but 5 are 
dependent 


successful salesmen. 


(and 2 


sociophilic). Perhaps the 
best guess is that the relationships are cycli- 
cal. Happiness or confidence or both may in 
many cases be a product of success and at 
further success. If 


measure, 


the same time a cause of 
this is so. the use of a 
confidence and happi ess, 


including 
for selection pur- 
poses would continue to produce favorable 
results. But only those whose confidence or 
happiness preceded application for the dealer 
salesman job would have much likelihood of 
being hired. 
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Assuming, then, that all four variables 
may, on occasion, contribute to success, how 
might this come about? Presumably in the 
case of dependence there is a need for sup- 
port and help which is achieved by having a 
dealer accept one’s point of view (making a 
sale). This support in turn motivates the sales- 
man to go in search of new successes and new 
attention. Such a man will work at a fever 
pitch when he is in a helping and rewarding 
environment, but he will also work to create 
such an environment for himself when it 
lacking. He will do all he can to avoid an- 
tagonizing others and to attain the sense of 
exhileration that frequently goes with a sup- 
portive relationship. Such people are likely to 
obtain the cooperation of their dealers and 
devote considerable effort to promoting the 
company’s products. It is not the hard work 
alone, however, that produces results strength 
of work motivation per se appears to be un- 
related to sales success), but rather the com- 
bination of work 
please others. 

Something similar may be involved in the 
case of sociophilia. These are men who like 
to be with other people, who seek out others, 
and are happy in their presence. Because they 
wish to be with others they are unlikely to 
antagonize them. The of the 
dealer (the sale) is a by-product of a friendly, 
emotionally positive relationship. Happiness 
may well accomplish the same thing. Pleasure 
can easily become infectious and lead to com- 
radeship (and incidentally a successful sales 
record). 

All of these characteristics fit the pattern 
of what has been termed “the soft sell.” Self- 
confidence, on the other hand, appears to be 


is 


motivation and a need to 


cooperat ion 


a characteristic predisposing people toward 
“the hard sell” approach. These people will 
try anything. They get real satisfaction out 
of meeting challenges and are emotionally un- 
inhibited. Their judgment is not always the 
best, but their enthusiasm and persistence is 
striking. The thought of failure farthest 
from their minds. 

In spite of this distinction between those 
who sell softly and those who sell hard, there 
is running through all these characteristics a 


is 


common thread, the tendency to express emo- 
In 
contrast this emotional freedom is lacking in 


tion, especially positive emotion, freely. 


Miner 


people possessing only the negative indica- 
tors. Lack of aggression, sociophobia, and 
strong superego are characteristic of rather 
anxious, emotionally inhibited people. Low 
aggression implies a tendency to bottle up 
anger, to keep oneself constantly under con- 
trol, presumably out of fear of the conse- 
of letting go emotionally. The de- 
pendent person wants to please and conse- 
quently keeps to a minimum as 
long as there is a possibility of obtaining sup- 
port in this manner, but he can become angry 
under appropriate circumstances. The person 
who represses his aggression cannot permit 


quences 


aggression 


himself to experience anger, although he may 
in fact be quite aggressive without realizing it. 

The emphasis on emotional inhibition is 
also characteristic of the sociophobes. They 
apparently avoid other people out of a fear 
that contact with others will trigger 
strong emotional reactions and bring on re- 


close 


taliation; similarly, with the people having 
They work out of a sense 
of duty, an internalized fear of being pun- 
ished, and any sense of pleasure in their work 
or enthusiasm is likely to be lacking. In fact 
the concentration on getting the job done 
and doing things correctly may be so intense 
that the dealer’s needs are neglected. These, 
in contrast to the dependent group, are hard 
workers who, nevertheless, are likely to fail. 
Their very preoccupation with work may tend 
to antagonize their dealers. They are tempted 
by opportunities for free emotional expres- 
sion and close social interaction, but they 
cannot permit themselves to indulge such 
wishes. In addition there is some reason to 
believe that many men with strong superegos 
are troubled by ethical considerations. The 
sales job on occasion puts them in a conflict 
situation where they are torn between a wish 
to do the right thing and a wish to increase 
sales. If they follow their moral scruples they 
may be less effective salesmen. If they 
what they feel is 
strong anxiety 


strong superegos. 


do 
they experience 
shame. Either way the 
overall effect on their work is likely to be 
negative. 


wrong, 
and 


These negative indicators do not, in gen- 
eral, appear to be characteristics of the kind 
that might be expected to result from ineffec- 
tive performance. A look at the 10 poorest 
salesmen as measured by the uncorrected com- 
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posite reinforces this conclusion. Six of the 
men have strong superegos, five are low on 
aggression, and one is sociophobic. Obviously, 
none of these personality variables is invari- 
ably associated with failure. It is also worth 
noting that neither unhappiness nor low self- 
confidence, both of which are measured by 
the PAT, emerges as a negative indicator. If 
happiness and self-confidence are merely prod- 
ucts of then using the same argu- 
nent, unhappiness and a lack of confidence 
should be products of failure. The lack of 
evidence for the latter brings the former as- 
sumption into question. 


success 


It is interesting to speculate on the degree 
to which these findings might be applicable 
to other types of sales work. One is tempted 
logical that the di- 
mension of emotional freedom—emotional in- 
hibition as it has been defined here should be 
significant in many types of sales work. The 
only obvious exception that comes immedi- 
ately to mind is the retail clerk selling to cus- 
tomers who made 
prior to entering the store. But such a gen- 
eralization would seem to fly directly in the 
face of the evidence. Research indicates that 
very different 
different 


to generalize. It seems 


have a decision to buy 


effective with 
Dunnette and 
Kirchner (1960) found that an ability meas- 


ure predicted success 


predictors are 
sales occupations 
among industrial, but 
not retail, salesmen. Similarly, the five Strong 
Vocational Interest Blank pre- 


dicted for the industrial group did not over- 


scales which 
lap at all with the eight scales which pre- 
retail the other hand in 
the personality area the pattern changes. The 
Dominance scale of the Edwards Personal 
Schedule correlated significantly 
with performance of both retail and indus- 
trial salesmen. 


dicted success. On 


Preference 


This suggests that there may 
be a core of characteristics which contribute 
to success and failure in a great variety of 
sales occupations and in addition a number 
of specific factors which have predictive value 
only within limited spheres. 
possible that these 
primarily of an emotional or personality na- 
ture and that they are of the type identified 
in the present study. 


It seems quite 


core characteristics are 


This conceptualization is to some degree 
substantiated by Harrell’s (1960) findings 
based on a small sample of 21 dealer sales- 


that those with successful 
sales records scored reliably 
Stability, 
Aggression and Drive 
reuter Personality 


seem to fit 


men. He reports 
higher on the 
Dominance, Self-Confidence, and 
scales of the Bern- 
Inventory. These results 
well with the conclusions as re- 
gards self-confidence and low aggression de 
rived from the present study and with Dun- 
nette and Kirchner’s report regarding the 
relation of dominance to success in widely 
differing sales occupations. On the other hand 
it should be emphasized that these are all es 
sentially “hard sell Whether: 
or not the man charac- 
terized by dependence, sociophilia, happiness 


characteristics. 
“soft sell’ person, the 
is as effective in other sales occupations as 
he is in dealer sales work remains very much 
an open question 
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EFFECTS OF VARIATION 
INTELLIGENCE 
ROBERT E. KNOX 


University of Oregon 


OF 


PROFILE FORMAT ON 


AND SOCIABILITY JUDGMENTS * 


PAUL J. HOFFMAN 


Oregon Research Institute, Eugene 


In a variety of laboratory and practical settings, persons are confronted with 


1 profile of from which a 


scores 


more 
scores are commonly displayed in either a percentile or 


-rofile 
This 


global judgment is required 
T score 


tormat 


study investigates the effects of variation of these 2 formats upon judgments 


and 
which 
formats 


of intelligence sociability 
ducted ir 


the 2 


Two similarly designed experiments wert 


con 


6 undergraduate Ss each made 600 judgments from profiles in 
\ regression model was fitted to the data for each judge 


Rela 


tive to T scores, the percentile format was found to be associated with greater 


variance of judgments, higher reliability, and higher multiple correl 
findings support a view that judgments from profiles are influenced not only by 


the underlying meaning of the 


as well 


Two recent empirically based studies de- 
scribed by Martin (1957) and Hoffman 
(1960) have undertaken the investigation of 
the human judgment process through utiliza- 
tion of multiple regression techniques and 
models. The general procedure in these stud- 
ies has been to provide subjects with quanti- 
fied predictor information plotted on pro 
files from which the subject is to make cer- 
tain selected judgments. Typically, the subject 
has been asked to judge “intelligence” from 
variables 
Percent 


plotted values on nine predictor 
High School Rating, Status, 
Support, English Effectiveness, 


Self- 
Responsibil- 


Pub 
the Nationa 


This investigation was supported in part by 
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Institute of Mental Health. The 


assisted a predoctoral fell 


Irom 
senior autho wa 


owship from the Na 
United States Pub 


by 
tional Institute of Mental Health 
lic Health Se 
The analysis of 
ible through the 
ssing Center at 


Angeles 


rvice 

th ] n | 

1@ Gata was made po 
of Western Data Pro 
Los 


much of 
facilities 
the 


University of California, 


plotted 


These 


itions 


} 


cores but by } 


heir graphical location 


ity, Mother’s Education, Study Habits, Emo- 
tional Anxiety, and Credit Hours Attempted; 
or to judge “Sociability” from values on eight 
of the scales from the Edwards Personal Pref- 
erence Schedule: Deference, Exhibition, Affil 
iation, Abasement 
Change, and Heterosexuality. On the basis 
of 100 or so such judgments from a single 


Succorance, Dominance, 


subject within one of the above domains, a 
regression equation can be computed. In this 
regression the profile scores are the predictor 
variables the 
the The 
scribes the judgment process with respect t 
a judge’s weightings of the predictors. The 
square of the multiple correlation (R°) is a 


and subject’s own judgments 


criterion. regression equation de 


measure of the precision by which the linear 


of the weighted variables car 


the 


combination 


account for variance of his own jud 


ments. 
Jurgenser (1954) that untrained 


suggests 


persons interpret profile scores graphically 
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that is on the basis of “where Xs appear on 
the profile.” Results of a preliminary unpub- 
lished pilot experiment tended to support a 
view that the graphical location of scores on 
the profile did influence judgments to some 
extent beyond the actual scale values repre- 
sented by the plotted Xs. In this pilot experi- 
ment a group of four subjects was presented 
with two sets of 100 profiles in two different 
formats: percentile score and 7 score. For 
each profile in one format there was a corre- 
format. The 
then, was one in which scores were 


sponding profile in the 


othet 
procedure, 
held constant between pall s of profile s while 
the graphical location of Xs varied between 


There 


marked differences 


profile sets appeared to 
tormats 


Was 


between the two 


in judgments of sociability but N too 


small to be conclusive 


The purpose of the present study is to fur- 
ther the effects of variation of 
profile format on intelligence and sociability 
both 


above pilot 


investigate 


judgments. Two separate experiments, 


methodologically similar to the 


experiment, comprise the study 


EXPERIMENT | 
Ve thod 
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domain (i.e., sociability 


percentile and 7 


ment 

format (i.e 
Design. Profiles were 

sions one week apart, ¢ ich ssion 

intelligence and 75 sociability protocols.On any given 


consistin 


le judgments in both domain 
was plotted in the 


four ex 


test day a subject ma 
but all 
same profile 
perimental 


predictor intormation 
format. The 


subgro 


resulted in 
format 


design 
differing in sequence 
over the 4 
within a particular 
in Table 1 

To ensure 


It normal 

ables with zero interco: 
Since “fictitious” rather 
employed, no com] 
ments was possibl 
Subject Subject 
more girls, paid 
random from nearly 
not entirely random 
to minimize interjudg 
admitting only on 
rority, dormitory wil 


ment 


to subgroups was enti 


Prior to the ini 


Procedure 
profiles, all subjects 
written instructions « 
for each doma 
characteristics of 
were, of course il 
questions. During 
ject worked fron 
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PABLI 


VARIANCE Rati 


7 in the 
involve 


con 


and 75. sociability, 
order to task 
told that the profiles were 
Univer 
might 


5 intelligenc: 
In 
ment, subjects were 
structed from 
sity of Oregon 
making 
attitude 
for admissions or scholarship. Judgments were 
distribution 


profiles, 


same format. enhance 


describing real 
and that they 
to adopt 


information 
freshmen in 


their judgments try a judgmental 


appropriate to a member of a committet 
made 
ol 
hours 


ol 


on a stanine scale with a normal 


not forced. Two to 5 


judgments suggested but 


required by each subject to complete a set 
In Figure 1 
intelligence profiles in 


the 


vere 


15 judgments presented a sampk 


pair ol the two formats em 


ployed study 


In 
Result 


Each subject over the 4 weeks of testin: 


completed judgments upon eight sets of 75 


profiles each: 


te 


Basic to all results that follow are computa- 
tions made for the individual subject on each 
for 
each of the eight sets of judgments, values 
ol 


set of profiles judged. That is to 


say, 


have been calculated for: (a) variance 


judgments, (0) multiple correlations, (c) beta 
weights (d) test- 
retest reliability coefficients (for pairs of pro- 


for each predictor and 


file sets) 
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) 


R SUBJECTS: EXPE! 


Variance of Judgments. F ratios were com- 
puted for each subject to test for differences 
in variance of his judgments between the two 
formats. for 
each subject for a total of 64 F ratios in all. 
In every case a subject’s judgment variance 


Four such tests could be made 


on the percentile format became the numera 
tor for the ratio and his corresponding judg 
ment variance on the 7 score format was the 
These results are summarized in 
four of the 
of those 
the 1 


the 5 


denominator 
Table 
64 
exceeding unity, 
level and 10 more 
level for a total of 53 out of 64 significant Fs 
in all! It the 7 
format resulted in restricted variance of judg- 


be seen that only 
than 


43 are significant at 


2. It may 


ratios are less unity and 


are significant at 
thus seems clear that score 


ments relative to the percentile format 


TABLE 3 
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RETI RELIABILITIES AVERA 


SUBJECTS EXPERIMENT | 


ID 


Multiple Corr lations It 
that the criterion for the 


recalled 
mult iple correlation 


may be 


is the subject’s own judgments rather than 
some criterion external to the judgment proc- 
ess. These correlations have been transformed 
to Fisher z’s, averaged over subjects, and re- 
converted to ‘mean 
[hese values are summarized in Table 3. An 
analysis of variance of the Fisher transfor 
mation was performed from which multiple 


multiple correlations.” 


Rs were found to be significantly higher in 
the intelligence domain than for sociability 
(F = 25.25, p < .001) and higher for 


31.63 p< OO1). 


Z retest 
than for test (F The 
sample mean multiple R obtained from the 
percentile sets was higher than for 7 scores 
but fails of significance by a small amount 
(F 3.69). 

Test-Retest Reliabilities 
summarized in 


The mean coeffi- 
Table 4. In an 
these 
found 
more reliable than so¢ ialibility (F 


cients are 
data the in 
significantly 
15.49, 


analysis of variance of 


telligence subsets were 
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p < .001) and the percentile format more re- 
liable than T scores (F = 7.16, p < .05). 

Beta Weights. In no case did beta weights 
for the predictors show significantly greater 
shifts between formats than occurred between 
test and retest of the same format in either 
domain. 


EXPERIMENT I] 
Method 


The 
respect to purpose and the judgment task performed 
by the subjects. In this expe 
male and 


second experiment is similar to the first with 
riment, however, both 


female subjects were used and there was 
slight modification of design 
Design. Just as in Experiment I 


test material was 


administration of 
accomplished over weekly 
this design differs from the previ 
Experiment II 
given test day made judgements in just one domain 
while both profile formats were 
within the sets. The 
Table 5. It may 


tour 
However, 
that 


sessions 
ous one in subjects on any 
randomly inter 
mixed outlined in 
be seen that Groups 1 and 2 diffe 


Groups and 4 


design is 


design 
the two formats on 


only as to sex. The 


irom 


makes possible comparison of 


the same dependent variables as in the prior experi 


ment 


Subjects. Ten male ar freshmen and 


sophomore students serve ubjects and received 


financial remuneration for their time. Selection and 

randomization 

Experiment I 
Procedure. Procedurall) 

from Experiment I only with 1 to 

of profiles judged 


follow 


procedures wet the same as for 


differed 
subsets 
The S¢ 


outlined 


qaurir session 


differences 
ib 


from design differences 


ove. 


Results 


Judgments. These results are 
summarized in Table 6 and can there 


Variance of 


be seen 


to be in accord with the findings of the prior 


experiment. Seventy-three of 80 F ratios are 
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rABLE 6 


significant at less than the 5°% level and most 6.97, p< .01), and percentile format higher 
of these at better than the 1% level. than T score (F = 24.65, p < .001). Except 
Multiple Correlations. Mean values for for greater format differences, these results 


these correlations are presented in Table 7. are consistent with those of Experiment | 

In the analysis of variance of the Fisher Test-Retest Reliability. Reliability on intel 

transformations, intelligence effects were ligence subsets was, on the average. found to 

significantly higher than sociability (F be significantly higher than on_ sociability 

14.35, p < .001), retest higher than test (F subsets (F = 9.34, p 01) and the percentil 
format was found significantly more reliable 
than the 7' score (F 10.84, p < .01). These 
results support the findings of Experiment I 
\ summary of these mean reliability co 
cients is shown in Table 8. 

eta Weights. Shifts in beta weights bi 

tween different formats again were no greate! 
than shifts between test and retest admin 
istrations for the same format. 
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DISCUSSION 


The most striking and consistent results 
concern differences in the variability of judg- 
ments between the two formats. It is not 
difficult to summon an explanation post hoc 
Figure 1 it 
score format appears 
relative to the 
percentile. It may be surmised that judges 
are responding not only to the underlying 
meaning of the scores, but to the graphical 
position of the points on the profile in some 


for these differences. From 
be noticed how the T 


constricted and “squeezed in” 


can 


absolute sense. A point so many inches from 
the midline of the profile seems to carry 
addition to the that the 
point represents. It would follow that extreme 
judgments would be less likely to be made 
from 7 scores since the placement of extreme 
judgments along the rating scale on the pro- 
file page would result in judgments graph- 
ically more extreme than the predictor 


meaning in score 


scores 
themselves. 
1 


Perhaps a statistically sophisticated judge 


psychological spread 
beyond the actual spatial orientation of the 
points on the profile and more or less con- 
sistent with their true For the 

i tion 


judges of this study, limited in sophistic: 


could give the scores a ‘ 


meaning 


to what had been provided by the instructions 
and answers to their questions, this would be 
difficult task, 
for them, have 
stricted This 
I’ scores would provide 


a more and these predictors 


a psychologically re- 
the 
less information than 
differentiated 
judgments. Unless a judge is provided with in- 


would . 
variance. being the case, 


percentiles from which to make 
formation which enables him to differentiate 
the attributes to be 
make is 


average of all the attribute values. If 


between persons on 
judged, the best judgment he can 
the 
a number of such judgments is to be made, 
would result. Although 
the potential information was, in fact 
tical for both formats, the obtained reduced 
variance of judgments with 7 
scores suggests that differentiating potential 


here 
iden- 


small variance 


assoc i ited 


in this format was not so fully realized as for 
the percentile format 

Although the experimental procedure fol- 
lowed in these experiments precluded use of 
a criterion for accuracy, it is interesting to 
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PROFILE FORMAT 19 
speculate concerning the implications for ac- 
curacy which are suggested by the differ- 
ential variance in judgments. Do these dif- 
ferences represent underdifferentiation from 
the T format as 
tempted overdifferentiation 


above, at- 
the 


score suggested 


from per- 


centile format, or both? If overdifferentiation 
then we 


of those 


occurs from the percentile scores, 
might 
arguments advanced by Cronbach (19 
Crow (1957), that judgmental accuracy will 
suffer when this format is used. If, on the other 
hand, these under 
differentiation on 7 then we might 
conclude that predictor information is being 


expect, as a consequence 


5) and 


differences result from 


scores, 


lost on this format, adverse effects upon ac- 
curacy again being the consequence. 

Other significant findings involve the mul- 
tiple correlation coefficients and _ reliability 
coefficients. Before discussing these further, 
however, it should be pointed out that the 
computation of the multiple correlation was 
accomplished from T scores regardless of the 
type of profile format. It follows that the 
values for multiple R would undoubtedly be 
affected since what would be a “best” fit for 


type of scaling of predictor scores would 


on 
not be best for a nonlinear transformation of 
such scores. Whether the value for R increases 
or decreases when one goes from 7 score to 


percentile format depends, of upon 
which of the e most 
yields a linear relationship to the judgment 


criterion. 


course, 


] closely 


types of sca 


If the relationship between _per- 
centile scores and judgment is linear, then the 
computations of R will yield a somewhat re 
duced value. The results relative to multipl 
R and reliability are otherwise related and 
consistent. That is, 
are higher in 


the multiple correlations 
a direction predictable 
knowledge of the judgment 
comprising the criterion since low reliabilities 
have a restrictive effect on multiple R. 
Another point to be that 
greater familiarity with percentiles would wu 
doubtedly contribute to better understandin; 
and more precise handling of such data than 
would be the case for 7 


reasonably follow that the predictive value of 


from 
reliabilities of 


mentioned is 


scores. It would then 


percentile scores would be enhanced not only 
the more to the judge 
in the first place but because the judgment 


because scores 


mean 
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model would remain more stable in the course 
of judgment. Pertinent to this point is the 
present finding that test-retest reliability is 
greater for the percentile format than for 
T scores. 

In general, results of Experiment II are 
consonant with those from Experiment I. 
Some interesting and significant differences 
may be found, however. For example, the F 
ratio of 3.69 relating to differences between 
format mean multiple Rs that just failed of 
significance in Experiment I is found to take 
a value of 24.65 in Experiment II. Similarly 
the F ratio for reliability differences between 
formats is much greater for the second experi- 
ment than the first. It thus seems apparent 
that presentation of both formats during a 
single session had greater effect than format 
shifts one week apart. Parenthetically, it may 
be noted that the magnitude of differences in 
domain displayed the same tendency. Dif- 
ferences were greater in Experiment I when 
both domains were administered in a single 
session than in Experiment II when domain 
was constant during a single session. 


These results suggest operation of some 
sort of “anchoring” effect so often referred 
to in psychophysical experiments. The anchor- 
ing point for T scores and percentiles together 
could be expected to differ from the anchoring 
point for each of these formats separately. 
This effect would tend to give greater em- 
phasis to differences arising from the two 
profile formats when they were intermixed. 


REFERENCES 


CronsacH, L. J. Processes affecting scores on “un 
derstanding of others” and “assumed 
Psychol. Bull., 1955, 52, 177-193 

Crow, W. J. The effect of training 
and variability in interpersonal perception. J 
abnorm. soc. Psychol., 1957, 55, 355-359 

HorrMan, P. J. Generating variables with arbitrary 
properties. Psychometrika, 1959, 24, 265-267 

HorrMan, P. J. The paramorphic representation of 
clinical judgment. Psychol. Bull., 1960, 57, 116-131. 

Jurcensen, C. E. Reporting employment test scores 
to supervisors. J. appl. Psychol., 1954, 38, 277-278 

Martin, H. T., Jr. The nature of clinical judgment 
Unpublished doctoral dissertation, Washington 
State College, 1957 


similarity.” 


upon accuracy 


(Received February 14, 1961) 





} 


Journal of Applied Psychology 
1962, Vol. 46, No. 1, 21-25 


THE EFFECTIVENESS OF WHYTE’S RULES: 


“HOW TO CHEAT ON PERSONALITY TESTS” 


MARVIN E 


SHAW 


University of Florida 


A study was conducted to determine 
(Organization Man) 
took the Bernreuter Personality 
rules (dishonest, or D) 
HH, HD, 


significantly from D 


conditions 
differed 
nificantly 


scores 
more 
and self-confident and less neurotic 
D scores were not chosen by 


the degree to which 
rules improves personality test 
Inventory 


Results 
(b) D 
extraverted and sociable and tended to be more self-sufficient 
than did H scores 
personnel supervisors for 


following Whyte’ 
undergraduate 


Whyte’ 


scores. 94 


twice, either following 


or responding normally (honest, or H), providing 
DH, and DD 


showed that: (a) H 
indicated Ss 


score 
scores were sig 
(c) Ss represented by 


a supervisory position 


significantly more than were those represented by H scores. It was concluded 
that following Whyte’s rules offers no great advantage when applying for a 


supervisory position 


In a somewhat controversial volume, Whyte 
(1957) violently attacked the use of person- 
ality tests as selection devices. His specific 
criticisms may be not so loosely paraphrased 
as follows: 


1. The analyst bases his evidence upon un- 
reasonable evidence; the test-taker always 
loses. 

2. Personality tests are self-confirming 
The established norms are 
playback of what the organization demands 
than the reality of the person’s own self. 

4. Successful persons do not fit the profiles. 

5. Most tests properly 
validated. 

6. You can’t measure future performance in 
critical situations 

7. Even if you could 


to do so! 


4 
De 


more of a 


have not been 


it would be immoral 


He then suggested that the thus maligned 
applicant must try to protect himself from 
the unscrupulous testers by cheating. Ap- 
parently, his reasoning is that since testers 
are immoral, the test-taker should be also; 
that is, two wrongs make a right. He then 
proceeds to tell the test-taker how to cheat 
on personality tests and interest inventories. 

In this paper, the concern is with the ef- 
fectiveness of Whyte’s rules; no attempt is 
made to either support or deny his other al- 
legations concerning tests and testers. Specif- 
ically, an attempt was made to answer the 


question: Do persons who follow Whyte’s 
rules in completing a personality inventory 
earn more acceptable scores than do persons 
who respond to the questions in the usual 
manner? 


METHOD 


The subjects used in this study second 
semester juniors and seniors at the 
Florida. For the most part, they were majoring in 
business administration, and either applying 
for jobs in industry at the time the tests were given 
Thus the sample 
is representative of the population actually subjected 
to this kind of testing in the 

The Bernreuter Personality 
for investigation for two reasons: (a) it is one of 
the personality tests mentioned specifically by Whyte 
is being sus« eptible to systematic 
his rules, and (6) it is widely used in industry 

The inventory was administered to four groups 
of subjects, each group being tested on two occasions 
one week apart. Before each administration, the 
following job description was read to all 


were 
University of 


were 
or expected to do so within a year 


industrial situation 


Inventory was chosen 


bias by following 


groups 


manufactures elec- 
blueprints, sketches, 
orders; determines procedures 
duties to 
spects their work for quality and quantity; 
tains harmony 
and discharges workers 


which 
Interprets 


Supervisor in a 
trical equipment 
verbal or written 
of work; 


plant 


assigns electricians and in 
main 
among workers; employs, trains 
assists subordinates dur 


ing emergencies 


Three days before the first administration to two 
of the groups, subjects were given copies of The 
Man (Whyte, 1957) with instructions 
to read the appendix and bring it with them to the 
scheduled meeting. The first administration to these 
groups was preceded by the following instructions 


Organization 
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you have lied for the position 
just been 
to take this 
willing to 
Therefore, you 
how to cheat 


refer to thes 


Assume that 
which has 
asked 
you 
hired 
concerning 
You 


while completing the inventory 


apy 
ribe d, 
You 
whatever! 
follow 


and you have 
the 
necessary 


Whyte’s 


versonality 


des« 


been test want job, 
and 
to 


rules 


are do is 


on 
get wil 


on f 


tests may rules at any time 


in- 
first 


wert not given any 


prior to thi 


other 


The 
formation 


two 
regarding Whyte’s rules 


administration, and were instructed as 


groups 
fol] 
follow 


Assume that you have applied for the position 
which has just been described, and you have been 
asked to take this test. You the 
will 


the uestion 


want job; there- 


fore, you answe! honestly 


as you can 


two 
the 
instructions 


re spond 


ation, one of the 
Whyte’s 


en the same 


admini 
had fe 
administration were gi\ 


other one was asked 


second 


On the 


groups which lowed rules on 
first 
as before, while the 


on the 


to 
Similarly, one 


honestly 


honestly second administration 


of the 
the first 


two which had responded 
the 
group was 

which had foll 


Thus 


groups 
instruction 
given the sam 
ved Whyte’s 


there were 


were given ame 
again, the othe: 
treatment as the group 
the first administration 
experimental orders 01 
dishonest (DD), 

(HH), 
paring honest 


to 


time 
whereas 
rules on 
conditions 
DH 
(HD 


dishonest ones it i 


tour 


dishonest 
! 


dishonest-honest 1onest 


honest and honest-dishonest By com 


scores with the 
the 
their score 
The desi n Is ermit a 
consistency u r th 


which 
ollowing Whyte’ 


determ 


pos ible determine deg 


re able to change 


different 
rminatio 
had been 
HD and DH 
form profiles; ie., 
the 
the 
lotted 


conditi 
After 


subjects in the 


the tests score 


group 
rraph to the 
ordinate 
abscissa 
the 


t 
on 


subjects 


E. 


SHAW 


scores and his dishonest 
Thus, there 51 graphs representing the 
51 subjects who took the Bernreuter both honesth 
and dishonestly. The graphs were identified only by 
to personnel 
industrial organiza- 
the Bernreuter 
their selection 
these personnel] 


representing his honest one 


scores wert 


submitted seven 
four 


using 


numbers and 
supervisors representing 
that currently 
Personality Inventory 
program. Except for 
supervisors had had considerable experience in using 
the select first-line supervisors. E 
supervisor was given the jol 
and ked to indicate 
persons represented by the profiles he 
hire for this this 
upon a pe in 
merely a not based 
difference 
either both or 


code 


tions were 
as a part ol 


one company, 


inventory to ich 


quoted 


description 


above which of the “two’ 


would pref 
choice W 
} 


whether 
difference 


to position, 


based rceived acceptability 


or forced choice upon 
wheth 
neither of the « 
the The 
both candidates 


of the proportion of candi 


was 


verceived in acceptability, and 
} 


he 


dates 


hire indi 
represented 


re j ( ting 


would 


by profiles option of 


accepting or was offered 
to obtain some 
1 that 


aates 
would actually 


estimate 
be 


cceptable in the sense that 


red the 


would 


offe job 


RESULTS 


One of 
asked is whether subjects responded differently 
under the different instructions. To 
this question, scores on the first and second 
administrations were correlated. The results 
Table 1 As ( be the 
correlations were considerably higher for the 
HH and DD than for the HD and DH con- 
ditions. estimates of 
reliability, it is evident that subjects in the 
DD in their 
sponses HH 


whereas subjects in the other two groups were 


che first questions that might be 


answe! 


given in an seen 


are 


Considering these as 


group were consistent re 


as 


as were those in the 


PTOUD 
I 





EFFECTIVENESS OF 


Wuyte’s RULES 


rABLE 2 


MEAN DIFFEI 


PERS LIT NVENT* 4 Fur 


This latter finding, of 
merely indicates that subjects were 
diffe 
for this conclu- 


he 


highly inconsistent 
course 
responding differently under the rent in- 
Further evidence 
sion is found in Table 
mean differences between the 
in the 
None of the differences is significant by ¢ 
in the HH and DD conditions, whe 
ferences significant for four of 
the HD condition 
e six in the DH condition 
1 and 2, there are 


structions. 
which presents t 
first and second 
conditions. 
test 


reas dif 


administrations various 


the six 


and for 


are 


scores in 


scale 
three of t 


eal 


In Tables 


ferences between the HD and DH conditions 


noti le dif 


the 


1 (using 


the differences between 
Table 


\ ¢ test of 
responding correlations In 
transformation) indicated that these 
ferences could be attributed to 
larly ignifi of 
between the HD and DH conditions shown in 


,] 


CorT- 
the 
dif 
nan Simi 


4 
larly, a test of significance the differences 


Table 2 failed to demonstrate an acceptable 
level of significance. (It should be noted that 
HD column of 


that 


in the Table 2 a positive value 


indicates the honest scores were higher 


AND SEC( 


BERNRI 


INDITION 


ND ADMINISTRATION THE 


IN OF EXPERIMENTAL ( 


DH 
honest 


than the dishonest ones, whereas in the 
negative value means that 
scores were higher than dishonest ones. There- 
all those for the 
Dominance-Submission scale were in the same 
direction.) These results 
no order effects 


column a 


fore, differences ex< ept 


suggest that there 


were relative to biased re- 
sponding. 

The next question is whether subjects were 
able to improve their scores by following 
Whyte’s rules. In attempting to answer this 
question a number of tests were made. First, 
the first and 
for the HD and DH 
were compared separately with the differences 
between the first and second administrations 
for the HH and DD groups (here treated as 
controls), again by ¢ test. The 
in Table 3. As can be seen, scores on 
roversion-Extraversion” and “Sociability” 


the differences between second 


administrations groups 


results are 
given 
“Tn 
vielded significant differences on three of the 
four tests made, ‘‘Self-Sufficiency” and “‘Con- 
fidence in Oneself” on two of the 
Neurotic Tendency” of 


four tests 


on one the four 


HH HDvs. DD DH 
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tests, and “Dominance-Submission” 
of the tests made. The mean 


on none 
scores of the 
HH groups were also compared with those of 
the DD groups for the first and second admin- 
istrations taken separately. Only the scores on 
“Sociability” significant for both ad- 
ministrations (p < .001 and p < .01, respec- 
tively). “Introversion-Extraversion” scores 
differed significantly for the second admin- 
istration only (p < .01). No other difference 
was significant. 


were 


Taken together, these results suggest that 
when subjects followed Whyte’s rules their 
scores indicated that they were significantly 
more extraverted and more sociable than 
when they did not. Their scores also tended to 
indicate that they were more self-sufficient 
and _ self-confident, when 
the rules were followed, but differences were 
the 


scores seem to be more 


and less neurotic 


reliable on only a few of tests made. 
Thus 
socially acceptable than the honest ones. 

to whether 
the “improved” scores aided subjects in ob- 
taining the supervisory job for which they 


were instructed to apply. Evidence on this 


the dishonest 


Finally, one may inquire as 


point comes from the personnel supervisor’s 
Be- 


fore presenting this data, it is necessary to 


choices of honest and dishonest profiles. 


say something of the reliability of the super- 
visor’s judgments. A this 
by comparing the choices of 
each supervisor with those of every other 


rough estimate of 
was obtained 
supervisor, and computing the average per- 


cent agreement. This was found to be 80% 
for agreement 
as to whether neither or both profiles would 
for the job. Thus the profile 


ve 
choices appear to be reasonably reliable, with 


for profile choices and 63.2 
acceptable 


acceptability choices somewhat less so. 

Since there is some question about the 
of the in in 
which supervisors indicated they would hire 


meaning forced choices cases 
neither or both of the persons represented, 
comparisons were made both for total choices 
These 
As 


chosen 


and for clear-cut choices only. 
Table 4 


were 


com- 
can be 
more 
often than honest ones in both comparisons, 


parisons are shown in 


seen, dishonest profiles 


As 


greater than that between companies 


expected, within company agreement was 


E. SHAW 


rABLE 4 


CHOICES OF HONEST AND DISHONEST PROFILE 


PERSONNEL SUPERVISORS 


although neither comparison yielded a statis- 
tically significant difference (all choices, 
4.76, p< .30; 
p< .50). 
It also is interesting to note that on the 


> 


clear-cut choices, X de 


Ya 
38, 


average 52°% of the profile pairs were checked 


“would hire neither,” whereas only 7.5°> were 
checked “would hire both.” 


DISCUSSION 


It should be emphasized that the concern 
not with the 
whether respondents can bias 


here is merely question of 


their 
but rather with the question of whether they 


scores, 
can follow a set of rules and achieve a desired 
result of getting hired for a supervisory job. 
If the former were the concern, this stud 
would be of little interest since it is already 
well known that this sort can 
be biased (Anastasi, 1954; Bernreuter, 1933). 
The results of this study substantiate this 
conclusion. 


inventories of 


The question of whether Whyte’s rules 
actually helped the respondents is not 
clear. Hanawalt (1944) re 
ported that supervisors are more extraverted, 


self-confident, 


SO 
and Richardson 

and less 
Results 
dishonest 
this 


and dominant, 1euU 
this 


respondents 


rotic than nonsupervisors from 
that 


agreed 


study indicate 


more nearly with than 
did respondents on all scales except 
the To this then, 


following Whyte’s rules did produce a more 


pattern 
hones 
dominance scale. extent 
desirable result. It has already been men- 
tioned that the dishonest scores appear to be 
more socially acceptable. However, these re- 


sults are based upon average scores for each 





[EFFECTIVENESS OF 


scale taken separately, and thus do not take 
a given 
might improve on some scales but 
Overall, 
his scores might be less acc eptable; one “bad” 
might than 


into the account the possibility that 
subject 


earn less desirable scores on others. 


score do more harm several 


“ood ones. 


For 


personne ] 


shite 
thi 


be lie ved 


for 


the 


job 


that 
the 


the effective- 


reason it 
supervisors’ choices 
more sensitive measures of 
In this 


honest respondents had only 


are 
dis- 
1 slight, statis 
tically insignificant advantage over the honest 
All the 
more of the 
but the 


ness of following rules. respect 


respondents persont e] supery isors 


dishonest than 
difference 
that this appeared to be a chance event 

lack of diag 
nostic skill of the per el supers 
were the cas difficult to 


understand the high ag 


hose profiles 


( 
h 
n 


onest ones S were so small 


his 
attri 


result might be 


isors how- 
ever, if this 


eement 


imong judge S 


(80°. overall interjudge agreement). From a 


practical point of view, the respondent who 


prefers to ‘“‘play the percentages” might con 


sider following Whyte’s rules whe 
the type. At 
time, he might well keep it 


half of all respondents ( 


faced with 


s of Bernreutet the same 


mind that almost 


did better 


by responding honestly. From the personnel 


worker's point of view iasing of this sort 


does not appear to be serio r questions 
the tes scores are 


of the vy ilidits of more 


pressing 


WHYTE'S 


RULES 

All in all, it is evident that following 
Whyte’s rules did have some effect on the 
small one, and in 
the predicted direction. In considering the 


inventory scores, albeit a 
interpretation of these results as discussed 
above there are a number of points that must 
be kept in mind. First of all, the present in- 
vestigation was limited to a particular super- 
visory job. The effects of 
might be much greater in relation to a differ 
ent type of job, say that of salesman. Also, 
it should be kept in mind that several years 
have passed since Whyte first suggested his 
rules. It 
have modified their interpretation of scores 


following rules 


may be that personnel supervisors 
the effects of his sug- 
To the extent that this has hap- 
pened, the present study fails to test Whyte’s 
contention, but alter the fact that 
at the present time and within the limits of 


to take into account 


gestions 
it doesn’t 
this study his rules do not appear to be highly 


effective. 
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TARGET DETECTABILITY AS A FUNCTION OF 
TARGET SPEED, NOISE LEVEL, 
AND LOCATION 


ROBERT D. BALDWIN, DAVIS J. CHAMBLIS I WRIGHT 


Human Resource 


An experiment was conducte 

observed targets displayed in each i 4 ontiguous { 1egre s( ectors at 
each of 4 radial velocities under 2 lev ) ual 1 s lysi variance of 
the mixed latin-square design did not reveal reliabl ices in scores due 
] 


ec 


to velocity, noise level, or velocity orders. More target designations occurr 
for the inner than the outer contiguous scope sectors (| 1), although th 
l lls per sector wel lifferent (p 5). Thes 


ney rather than 


The detection of a target on a radar scope ination a 
has been found to be dependent, in part, upon — ship between target locatio 
the size of the target (pip) and the relative 
brightness of the pip in relation » back METHOD 
ition level (Erdmann & Mye 
1958). When the background illumination 


level (noise level) of the radar scope becomes sition ator PPI) |] tec 
I 


P7 
greatly intensified, due to such events as elec- ntre le of a NIKI 
tronic counter-measures, the detectability of noise lev 

" ; : irget simulat 
target de reases. In rpm PPI sweep 


hypothesized that the 
the pip would influence targ tection. he target at the f the sweep 
Although it is a common field observation eer loca 

ripher f the The sweep inter 

le re initially adjusted 

facilitated by object movement, the relation- ic early defined marks and a “salt-and 

ship between target motion and detectability display of recel noise. These controls wet 


that detection of a camouflaged object i 


on radat scopes has not been studied experi- sti : 7 = for all observer 


mentally (Harris, 1960). Generalizing fron 


1 soft glow from tl 


p ited in front of the 
field observations, it was hypothesized red warning lamp to 
the dependence of detection upon target s d ti bserver’s rigl ll other sources of illumination 
would vary with the intensity f the vis I xtinguis! ymetric measurement of the 
ate . inter it} of h ent illumi: ation could not be 
priate equipn 

Blackwell (1953 ha Cl ted el] ible dif- The tar ts always were presented wit 
ferences in thresholds as a function of signal gree section of the PPI scope. As show! 

B leg 


aker (1958) and White and Ford 1}; four adjacent 30-degree sectors wer 


(1960) have found differential detectabilities 


location 


rrees Wwe ol 


on radar indicat to be related to the scan to } | 


= ) > 
C IVC 


north to du 
l near the center of the scope as 
ning method employed It was of interest to \ C, and D. The sector li and letterings did 


determine if such perceptual biases existed in 0° an urget when viewed 


situations involvi less than a full 360 de- 
} 
i 


grees of search (i.e., sector search). An experi- 

ment, therefore, was designed to determine TJarg: 
the lationship betwe rget detectabilit r ‘ndivi ] 

he relationship | oorroum target detectability, The targets were presented individually, each for 
target speed, and noise level, both separately 30 seconds in one of the four sectors. Each target 





TARGET DETECTABILITY 


INNER SECTORS ) Zero—jammer level zero; (6) Light—jammer 
— Pe level less than the lower level for the test phase; 
ie i] ) Moderately Heavy—jammer level set between 
; the two noise levels for the test phase 

Three targets were presented for each of the noise 

levels during training, one target each at 5 
ind 700 knots. All observers were given identical 
training. After a brief rest period, the test phas 

began 
Testing. The test phase consisted of target run 





ight successive targets being presented at each of 
four target speeds: 2 , 600, and 800 knot 
Each track was pres | econds with 
seconds between suct ve s. At each target 
peed, two target cS rrogramed in each of 
the four scope s or The if I presentation ol 
targets in sectors was ! 1 1 for each observe 
Phe particular entry 
1 sector was randomized acro 
spatial for hoice method i 


PPI ved, nowledge of results was pro 


Was progran 
yond the 


yt ) 


Target Velocity and Noisi 


in th et aria rhe average number of targets detected at 


approximately 


fe each velocity under the two noise levels is 
int, - 
irget quite when presented in Table 1. 
the Moderate i noise ; tionary target The correct detection scores were analyzed 
measured 4 ee ee ee ~ according to the Type IV mixed analysis 
icy of variance described by Lindquist (1953, 
p. 285). The variables included in this analy 
: sis were target velocity, noise level, and ve- 
Forty « ; 
had prior 
operation 
The observers rang irom 18 ») 23 years 
mean age é »; SD nd had Gener 
Technical aptitude area s : nging from 
120 (Mean GT 108 ; ( Of the 
) were Regular Army 
Forces Act RFA) trainee 
The observers were rar 


the eight test-group 


Procedure 


Training 
consisted 
duration 
Blackwell 

one of the fou 

edge of results was prov d for ea 


Three noise levels were employed in the training 
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locity orders. This analysis did not reveal any 
reliable differences among the detection scores 
associated with any of the variables. 
Supplementary covariance analyses were 
using the 
during training as the control factor and tar- 


conducted target detection score 
get velocity and noise level as the independent 
These yielded 
decrements associated with the higher noise 
level, but there were no effects due to either 
the variations in target velocity or the velocity 
order. 


variables. analyses reliable 


Bias 


Blackwell (1953) has 
threshold differences associated with spatial 
of the stimulus, it was of interest to 
determine if observers exhibited any 


Re sponse 

Since reported 
location 
response 
that is, did 
the observers designate one scope area more 


bias in detection of the targets: 


than the others (targets were presented an 
equal number of times in each sector for each 
9] 


observer). Table 2 presents the average num- 


ber of correct, incorrect, and total designa- 
tions for each scope sector. This table also 
indicates for each sector the proportion of 
total calls that were correct and the propor- 
tion of the total targets presented in 
sector that were correctly identified 

The total 
number of times each sector was selected by 
Moderate 
Noise groups were compared by analyses of 
variance techniques (Lindquist Type I Mixed 
Factorial). This analysis revealed reliable 
inequalities in the frequency of observer 


each 


differences among the average 


the observers in the Heavy and 


0.51 0.57 


0.58 0.50 


CHAMBLIsS, AND A. 


D. WRIGHT 


TABLE 3 
SUMMARY ( NALYSIS OF DIFFERENCE SCORE 
ncorrect designations 


Variance 


estimate 


area (F = 6.82: 


no effects due to the differ- 


responses to ea h 
df = 3/114) but 
ent noise levels. 

Differences among the total calls for the 
sectors were tested for significance using Dun- 
(1955), em- 


S¢ ope 


can’s new multiple range test 
ploying p =: .05 for 2-mean differences: 

1. There was no significant difference either 
between Sectors B and C or between Sectors 
\ and D. 

‘ calls occurred for 
A or D and for 


Significantly more 
Sector B than for Sector 
Sector C than for A or D. 

These results indicate that the inner sectors, 
B and C were selected more frequently than 
the outer sectors, A and D. 

An analysis next was made of the correct 
detections in each sector. This analysis of 
variance revealed reliable differences among 
numbers of correct detections in 
each sector (F = 4.80; df = 3/114), but no 
differential effects associated with the 
levels. As indicated in Table 2 the 
mean correct to mean total designations ap- 


the average 


noise 
ratio of 


pear to be fairly constant over the four scope 


sectors. The comparability of the ratios of 
correct to total designations was evaluated 
statistically by computing the algebraic dif- 
between the correct and incorrect 
designations made by each observer for each 
sector.’ The difference scores were treated by 

1 The correlation between the 
ratio score and the difference score for each observer 
was computed for Sectors A and C. Since the ob- 
tained coefficients were 0.98 and 0.96, respectively, 
the analysis of 


ference 


product-moment 


variance employed difference scores 
rather than ratio scores, thereby avoiding problems 
associated with the need to adjust the ratio scores 


for index correlation (McNemar, 1949). 





TARGET 


analysis of variance techniques with the re- 
sults indicated in Table 3. 

The analysis revealed reliable variations 
among the average difference scores for the 
four sectors. The mean difference scores (cor- 
Sectors A, ; 


were Loo, 


rect minus incorrect) for 
and D, respectively, 
+0.20 + 1.02. 
Again using the new 
with p 


of the difference scores yielded the following: 


and 
test 


.05 for 2-mean comparisons, analysis 


multiple range 


1. There significant variations 
among the difference scores for Sectors A, B, 
and D. 
Zi The 
significantly lower than the scores for A and 


different 


were no 


difference score for Sector C was 


B, but was not significantly from 


Sector D. 
DISCUSSION 


Che 


well established observations concerning the 


results of the experiment confirmed 


disruptive effects of high background noise on 


the detectability of targets on radar scopes. 
The results of the test do not provide evi 

dence supporting the gross hypothesis con 

cerning a relationship between detectability 


and target velocity. The average number of 


targets detected was approximately the same 
for the four target 
confirm 


velocities. The failure to 


is not believed to be a reflection of a 
false hypothesis, but a reflection of an incom 
plete hypothesis. The absence of a relation 
ship between detectability and velocity in this 
study may be attributable to the relatively 
target 


radar scope. Target detectability 


large size of the at the edge of the 
s known to 
and 
conjectured that 


be dependent upon target size, contrast, 
form definition. It 
the stimulus 


may be 


characteristics influencin 


tectability form a hierarchy that when a 


critical level of one attri 
reached, subordinate characteristics 
ample, velocity) have no reliable 
upon detectability The target sizes available 
actual 


target was 


for this study were large relative to 


target “pips” and the form of the 


well defined. 


\nalyses of the distribution of the ob- 


servers’ responses over the four contiguous 


scope areas indicated that more target desig- 


nations occurred for the inner sectors than 


influence 


DETECTABILITY 


the outer sectors, even though targets were 
presented equally often in all sectors. The 
differences between the response frequen i 
by sectors tended to be consistent for both 
the correct and false target designations. The 
inequality of the response frequencies for the 
may be at- 


tors: the 


and outer sectors 
tributed to 


method employed, and differences among fre- 


inner scope 


several fac scanning 


quent ies of reinforcement for the 


sectors. 


Previous research has found that for a 36( 


degree PPI search situation, the relative fre 


quenc ies of observet Scans or looks, Was 
greater near the midpoint of the sweep radius 
than at the (White & 
Ford, result ap- 


pears to be a function of the searching tech 


inner or outer ends 


1960: Baker, 1958). This 


nique employed; e.g., the observer may begin 
his search near the midpoint of the sweep, 
scan to the outer periphery, scan back to the 
center, and then scan to the inner end of the 
sweep line. This scanning pattern results in 
more scans of the center of the sweep than 


at the ends. Use of such a scanning pattern 


obviously results in a greater frequency of 


correct detections at the center of the sweep 
than at their end 

In a similar manner, scanning the four con- 
tiguous 30-« 


sult in a 
inne! 


legree sectors would tend to re 


greater Irequency Of sci 


(B and C) 
a statistical or probability basis 


sectors than the outer sec- 


tors. On 


greater proportion of correct detections would 


be expected as a result of the greater incidence 


| 
of scans. 
that more false designa 
tions also occur in the 


accounted 


The observation 
inner sectors could be 
lor by reference to the pring iple ol 
reinforcement. Since the 


i 
designated 


observers correctly 
more targets in the inner sectors 
they received a greater incidence of positive 
reinforcement for designating targets in these 
As a result, 


was not 


actual target | 


known but a designation re 


areas when the 


cation 


required, the inner sectors would 


i 


Was 


selected more frequent] outer 


The results of the 
ference by 


y than the 


areas analyses of the 


sectors between correct and 


rect designation however, do not support 


reinforcement hypothesis since the ratio ot 


correct to total designations was not equal for 


all sectors. These results that the 


Suggest 
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differences among the target designations by 


sectors were due to differences in observer 


expectancies based scan frequency 


rather than reinforcement frequency. 


upon 
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THE USE OF OPEN-END DATA AS AN AID IN 
WRITING MULTIPLE-CHOICE DISTRACTERS: 


AN EVALUATION WITH ARITHMETIC REASONING 
AND COMPUTATION ITEMS 


RIMLAND ann EDWIN ZWER 


the items in open-en 
ing items moderat 


irequency with 


\ method often advocated for obtaining of an investigation in which discriminatio1 
: 


multiple choice (MC) item distracters entails indices of items written it ind without the 
administering the item stem in open-end’ aid of OE data were « ared. He found 


(OE) form so that distracters may be chosen _ that student item writers, working without OE 
on the basis of the frequency of the resulting 


data, could compose MC items which wert 
incorrect responses. This procedu as discriminative as those wri 
available. He also found 
item writers differ in 

O] 
loops i npnatil i h endorse 


ment of the 


that the frequency with which exam ; data 


in each incorrect respot Wi rovid hat 
indication of 
will be chose 

Kelley (1 

ae 3 ‘ : : ‘ “we ¢ convinced 
of findings obtained { 1 ni imental | | nw 

y t makers, nowever € 

evaluation of the 
vocabulary test, Kelle: | a 
method was of auectionahle utility i lhe present study provides an evaluation of 

Ll ( as IUCSLIiOlia i \ l ( ‘d 


confusion alternatiy 


sa Veleaw tailed the OE method by determining the relationship 

Frederiksen and Satt soe _ between the proportion of examinees who 
ietnil doe enn of te iil cae eek oi te enter an item response in an OF test form 
construc ting an arithmet ! ' ion test and the proportion of examinees who choos« 
Aiea te ih sacaeied wsteasies the response when the item is subsequenth 


the procedure, the ee administered in five-choice MC form. The 
i L ® il i _ }. il ) 

over the OE and M erie of Form 6 of the Navy Arithmetic Test 

) i I 4 all I id . < Oil i ‘ 


; : : ; (Rimland, 1958), which e1 
the correct response then ne % an 


hal} 


7 +6 , . er data were obtained during the development 
item difficulty remained relatively nstant . , 


5 ph : : method of distracter selection 

frequency -o1-choice l mal la- ‘ , Arit] 

: tains two subtests, Arithm 

tively constant pans 7 i ) os 
ipso ere ted the findi and Arithmetic Reasoning 


analyzed separatel 


Proc 


CCAUSE 


prima 





BERNARD RIMLAND AND EDWIN ZWERSKI 


the development of the Navy 
possibilities for 


the present stud 


Arithmetic Test, the 


a) based on all 
tatistical analyses of the data 


| item responses combined, N 168 
in (42 items + choices per item) and (b) based on 
only the three numerical distracters appearing in both 
use of “Right answer not given” as a the OE and MC item 
choice in the MC form of each Computation-type “Right 
item This choice could 


were limited. One problem con 
cerned the 


forms (i+ omitting the 
answer not given” and _ pooled 
not be matched with any’ choices), N = 126 (42 items 3 choices per item). 
response in the OE format. As a 

partial solution, in correlating the OE and MC 2 
response frequencies, the 


given” was paired, 


response 
single numerical 


Reasoning items 
choice, “Right answer not +. : 

: The same basic analy 
for each Computation item, with 
he pooled frequencies of all minor incorrect re , . 

- ‘ ‘ } % 1 : j to a difference in the form of the data 
ponses ppli by th ubje $y “minor re “Rp: : } 
' eT fem Right answer not given” was not ed in the M¢ 
ponse iS meant any incorrect response ranked f - . . 17 . 
; 7 form of the Reasoning items, it was possible simply 
fourth or higher in frequency, the three most pop- : : : Pi 
, to compare the distracter response frequencies of thi 
ular OE incorrect responses having been taken for 
vithout pooling 


ses were 


performed on the 
Reasoning items, but with 


minor modifications 


Inasmuch 


: : “ay two formats 
use, along with “Right answer not given” in the 
MC form. The assumption underlying the pooling ie 
mom cori f most fre 
of ming responses was that an examinee, after 
. : q tly repo responses in the OE 
computing an answer not listed as an MC choi 


id | which appeared in the MC m. The N 
response, wouk eT I 


‘minor responses.” The 
Ns therefore fo the 9 within-item product 


responding equi whether im iin dees cates Seat dd * 
response on OE form o Wess aoa ‘ 

yn the MC form : sa iia ries per Nem 
No item which had “Right answer not given” as 

the right answer was included in the 
An additional problem arose because the numbe! 


writing in a minor 


checking “Right answer not given” « 


all trac 10 


analysis 


RESULTS 


: Computation Items 

of examinees responding correct was a relatively 

ro ) yrt ) oO ? ) ti ] l wer ( t xal rm . - . - > . . 

ge proportion of the total n SP arcgeqcatat The interquartile range of the 42 within 

attempting the items in both the OE and MC forms 7 ti f t} oe 

. . a , : em corre.ations le Tour-cnoice rmat 

Chis resulted in highly skewed distributions of item nie oa an - fe fOur-CAOIC sormat 
ponse trequencies, and it was therefore considered was found to be from .02.to .87, with a 

le to omit the corr ponse prior to median value of .67. 

nalyzing the dat 


correlation based on the com 

bined responses of all 42 items for the four 

naval . ° 

‘ 1 7 

cruits for both the Computation and Reasoning Choice compar! _ (A 168) be +8, Whig 
The MC distracter frequencies were based on the total correlation computed with only tl 

distracter choices (A 126: 

iowel Cc) 


MUS CHES sargel questionable ‘‘Right 
samples ecruits. Th Ss are some 


The total 
The incorrect 1 frequenci for the open- 
ended iten I on samples of 5 


eparate samples ¢ recruit 
combined upp« nd 


1ree 
omitting the 


answer not given” and 


: pg pooled response pall ) was .52. 
mallet an would be optimal for this stud ‘ i let — 

coiitiniiie he Hs Interpretation of any individual correlation 
ematically lower than they would be with larg: coefficient based on an N of four would be 
sample Phat is, larger | would yi highly dubious, but the obtained median value 
proportions which would turn reduc hance ef of .67 does 


for this reason th 


Suggest a moderately useful 
lationship between OE and MC distr 
response freque ncies for the (¢ omputation 
esponding to any given ite items. The two total correlations of 
De it +} lin , 4 lat r { 


tations 
ial 


ALLO! 
52 and 
48 tend also to support this interpretation. 


Reasoning Items 

1. Computation items a ; r 5 , , 
rhe interquartile range for the 39 within- 
Within-iten product-momen orrelations wet . . - ° 
oy dit : —e item correlations was found to be from .14 

ymputed, for each ms, between the OF “aig aif 

M( of .49, while the 
total correlation based on the combined dis- 
tracter choices of all 


7 wii oo 
silln iliarialliaiilien: Sines tani to .87, with a median value 


re ported incorrect 


which also appeared in th 3 items (\ 156) 
e ob aap of the was .55. 

ae Geen We These findings, when considered in 

junction with those obtained from the analysis 


of the Computation items, by and large tend to 


con 





OF OpEN ENp DatTA 


IN 
that there is a mod- 
OE MC dis- 
racter response frequencies for both Arith- 
metic Computation and Arithmetic Reasoning 
items 


support the conclusion 


erate association between and 


Vevgative Within-Item Correlations 


Inspection of the data showed a number of 
negative within-item correlations for both the 
Computation and Reasoning items. The pro 
portion. of negative item correlations was 
nearly equal for the two item types, being 
The the 


Reasoning items were more strongly negative 


about one-fourth. correlations for 
however, three of the eight exceeding 7 
while the 10 


correlations for Comput ition items exceeded 


none of negative within-item 
this value. 

A comparison of 
within-item 


items having negative 


correlations with those having 
positive within-item correlations showed no 
difference in terms of difficulty nor item-total 
for either ¢ 


test correlations (7j;), omputation 


or Reasoning items. The negative correlations 
to be the 


which must be expected with samples as small 


appeal largely chance deviations 


as four. 


The Effect of Changing Item Distracters 
test de- 


velopment whether several variations of prom- 


The question sometimes arises in 


ising items should each be tried or whethet 
the testing time should be spread among more 
items. 


had 


dis 


varied, albeit possibly less promising 
the 
yielded more than three equally 


Several of Computation items 
good 
tracters as a result of the OE administration 
Rather than discard the surplus distracters, 
six of the Computation items were used twice 
in later experimental forms; the same item 
stem being followed by a series of distracters 


which differed somewhat (by one or two 


choices) in alternate test forms. These minor 
distracter changes were found to have pra¢ 
tically effect 


discrimination values of any of the six items. 


no on either the difficulty or 


WRITING ARITHMETIC ITEMS 


This 


servation on the stability of 


tends to support Toops’ 
item 
despite minor changes in distracters 


These indications that it 


validity 


may not pay to 


try several variations of an item may not 


apply to all other item types, however, espe- 


items, in which the 
nature of the task posed by the item is to a 
the cl 


cially verbal analogy 


great extent a function of oices 


provided. 


DisCUSSION AND CONCLUSIONS 


Within the obvious limitations of the pres 
found that MC item 
the basis of OE re 
sponse frequencies tend to retain their rela 
MC 
It must be emphasized, however, that 


ent study, it has been 
distracters selected on 


tive popularity when readministered in 
form 
the present findings are based on Arithmetic 
Computation and Reasoning item types only 
and generalization r item 
not be appropriate. Whether the 
relationships found justify increased cost 
OE 

is the intended use of the 


the facilities he test 


to othe types may 


moderate 


tne 


of emploving the method depends upon 
such considerations 
test 


ce velope I 


and available to t 
Some of the suggestions offered in 
the 


on request) 


an unpublished paper by senior author 
(copies available may be found 


the clerical 


posed by the OE method 


useful in alleviating burdens in 
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De 
development and 
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THE MMPI IN THE 


SELECTION 
OF HOSPITAL AIDES 


MMPI he position 


l ts i the ( 
tarded. MMPI subscale 


compared with ¢ 
reasons not 
mpl institution, or (c) 
rentiated betweer The few ditt 
, } 
groups when configurational 


Conclusioi 


Ghiselli and Barthol (1953 
attempted 


Wel 
ing 113 studies which had 1 to dem sonality inventory, the MMPI 
onstrate the utilits OI perso! ality inventories the commonly used 
in the selection of employees, concluded that of 


iciency of the most fre 


and most rel 
iob success. conti! 


l ied empio 
inventor 


They found no ‘me, short of empirical 
which one could determin 


trial and error, by 


whether a_ particular 


would be helpful in 


personality inventory 
1 selecting employees for 


particular position. Psychologists 
plicitly 1 


iy recognized 


mded by l 
IpIric illy dete inil u 
ventories would hel 


g ) 
| help them in which employes 

election — tas} Disturbingly enough 
earche ncerned with validating 
s¢ lectior ol el 


have found conflicting 


greement. For 


m ve 
i and Reed (1957 


ind Levins 


y f 
Abily { 


good in¢ 1 


poor psychiat 


Derg 


ind Yerburg, Holz 


studies frequently 
tests and/or different criteria, 


t} 


the conflict- 


erpretation ol 


reporte | here Was designed 


to 
umber of different procedures 


he 





MMPI IN SELECTION oF HospITAL AIDES 


rABI 


ir 


subscales were compared simultaneou 
means of a Lindquist Type I “mixed” anal 

of variance (Lindquist, 1953). This analys 
Was performed sepa ately for each sex. Table 


» 
i 


3 show that although there were sig 
ificant differences between the mean subscal 


I scores within the combined D, R, and S 


f | } } } 3 
lor Doth sexes (subscaie Comparisons 


there is no pattern (groups x subscale 
{ 


comparisons) and no single subscale (groups 
comparisons) which differentiated betwee 
D, Hy, Fe any two of the three groups of either sex. That 
s, although the grou lid show a patter? 
of mean subscale 7 scores (see Fig 


groups showed the same genet 


+ ’ ’ ‘ } ‘ — ] * ] 
ern and the relative level of 


ALYSIS AND 


1 : 


first question to which this 
an answer was_ whether 

MMPI profiles differed significantly 
or whether there were significant differ- 
ences between any ol the three groups’ mean 
T scores. In order to answer this 
question the subscale 7 scores for each origi 
nal investigation sample on each of the 10 
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the same for all three of the groups. Since it 
that 
considered real only if they were found in 


had been decided differences would be 
both the original and cross-validation sample 
and since neither of the comparisons in which 
we were interested significant, this 
analysis was not performed with the cross- 
validation 


were 


group. 

The statistics on this first comparison were 
performed for each sex separately. It was 
found before performing each of the following 
analyses that there were no differences be- 
tween the sexes on these measures within each 
of the experimental groups. Therefore, all 
subsequently reported comparisons were made 
between combined sex groups. 

The second attempt to differentiate among 
the original D, R, and S groups involved a 
comparison of the number of subscales upon 
which the individuals in the three groups 
scored higher than T score 60. The number of 
individuals who had one, two, three, or four 
subscales (no subject had more than four sub- 
scales with a T score greater than 4) with 
I’ scores greater than 60 in each of the three 
groups was counted and these frequencies 
4 table (see Table 4). 
12 cells in this table had ex- 
pected frequencies of less than 5 it was con- 


were cast into a 3 
Because 6 of 


sidered unwise to compare the groups with 
the only appropriate measure, the chi square 
test. that 
the chi square would have been appropriate 
because the way in which they would have 
had to have been combined would have ob- 


Categories were not combined so 


scured the critical comparisons. However, in- 
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Fic. 1. Mean subscale T scores of psychiatric aides 


AND SUI 


ALLEN WARREN 


rABLE 4 


THE NUMBER OF INDIVIDUALS IN THE ORIGINAL SAMPLI 


D, R, AND S Groups with 0, 1, 2, 3, or 4 T Scores 


GREATER THAN 60 


Original sample 


Number groups 


I scores greater 
than 60 


ol subsc ale 


number of 
scores greater than 60 
differentiate between 
perimental groups. 

The third attempt to differentiate between 
the original sample D, R, and S groups em- 
ployed signs which were developed for the 
purpose 


spection clearly shows that the 
individuals having T 


does not the three ex- 


of differentiating between psycho- 
neurotics and schizophrenics (Taulbee & Sis- 
son, 1957). The sign score was calculated for 
each person in the original investigation 
sample. The frequency with which subjects in 
all three groups earned each sign value was 
cast into a table with groups on one dimen- 
sion and scores on the other (see Table 5). 
The score categories were then combined into 
two those from 1 to 6 and 
(see Table 6). A chi 
square test revealed that the three groups did 
not differ significantly with respect to the 
number of individuals who fell above and 
who fell at or [ 
(x 81, df oh 
Since Taulbee and cautioned that 
their particular signs would probably not dif- 
ferentiate among normals, we 


groups ; scores 


those scores above 6 


those below a score of 6 


Sisson 


followed 
their procedure for developing signs specifi- 
cally for 


next 


three 
groups. The 10 subscales with which we were 
working were arranged in descending order of 
T score for each subject. Rank order com- 
parisons of all of the scales for each subject 
were made by assigning the higher of the two 
scales a plus and the lower of the two scales a 
minus, e.g., if the Pd (Scale 4) was higher 


differentiating among our 





MMPI In 


rABLI 
NUMBER Of 


SAMPLE Gi 


INDIVIDUALS IN 


oups WITH Eacu 


laulbec 
score 


Value 


than Ma (Scale 9) by one or more 7 scores, 
the 4—9 comparison was scored plus and the 
9—4 comparison was scored minus. Chi square 
tests of differences between the S and D and 
the S and R groups were then calculated on 
the basis of the sums of pluses and minuses 
for each of the subscale pairs. Five of the 45 
pairs (Pd-Ma, Pd-Pt, Pd-Si, Ma-Si, Ma-Pt) 
differentiated at the .05 level between the S 
and R groups and one of the 45 (Pd-Si) pairs 
differentiated between the S and D groups. 
However, none of these pairs differentiated in 


rABLE 6 


NUMBER OF INDIVIDUALS IN Eacu OriGcr 


TION GROUP WITH TAULBEE SCORES 


ABOVE OR 6 AND BELOW 


Original sample gro 


laulbec Sc¢ 


6 or less 


4 or more 
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the cross-validation sample. We concluded 
that the Taulbee technique of developing diag- 
nostic signs did not lead to a which 
would differentiate between our three groups. 

The next effort to differentiate the groups 
involved determining whether any pair of 
high ranked subscales was more likely to be 
associated with membership in a particular 
group. The three highest subscales for each 
subject were recorded. The number of indi- 
viduals who did and who did not have each 


scale 


possible combination of two subscale scores in 
the top three scores was then counted. Chi 
square scores were then calculated for the 
occurrence and nonoccurrence of each pair 
between the S and D and the S and R groups 
in the original sample. Four combinations of 
two subscales (Pd-Mf, Pd-Ma, Ma-Mf, Pa- 
Pd) were found which were more likely (at 
the .05 level of significance) to be in the top 
three ranked scores of an individual if he was 
a member of either the D or R groups. These 
differences were not found in the cross-valida- 
tion sample. It was concluded that it is im 
to determine to which of the three 
groups an individual is most likely to belong 
by determining upon which pair of subscales 


possible 


he scores the highest. 
Finally, the groups’ 
protocols were pooled, arranged in a random 


three cross-validation 


order, and given to two clinical psychologists 
who were instructed to from the 
protocol to which group each individual be- 
longed. The clinicians were told how many 


determine 


protocols of each sex were in each group. The 
Taulbee and Sisson sign score was recorded 
on each protocol. When the clinicians had 
made their judgments the protocols were cast 
into a 3 X 3 table for which the actual classi- 
formed and the 
classification other 


fication one axis judged 
formed the axis. Chi 
square tests were used to test the hypothesis 
that the protocols were randomly distributed 
throughout the tables. This hypothesis could 
2.48 


con- 


not be rejected for either clinician (,? 
and 3.28, df= 4). It therefore, 
] , ‘linici . "ar ¢ ; 
cluded that the clinicians were unable to judge 
which person fell 
cantly better than we would expect by chance 


was, 
into which group signifi- 
alone when the criterion for the judgment was 


the MMPI profile. 
Agreement between the clinicians was de- 
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termined by casting their judgments into a 
3 x 3 table for which one clinician’s judg- 
ments the other clini- 
cian’s judgments formed the other axis 
Table 7) and calculating a 
coefficient for this table. The 
efficient was 


formed one axis and 
(see 
contingency 
contingency co- 
found to be .37 
nificant beyond the .05 level of significance as 
determined by chi square calculated for the 
same table (y? = 10.0, df = 4). 

We next tried to whether the 
clinician’s pooled judgments were more suc- 
cessful in classifying the subjects. All of the 
subjects who were classified in the same way 


which is sig- 


determine 


by both clinicians were divided into two 


All of 


cians agreed should still be employed were 


groups. the subjects whom the clini- 
put into one group and those subjects whom 
the clinicians agreed had been dismissed or 
had resigned were put into the other group. 
The agreement of these pooled judgments 
with the actual classification of the subjects 
was determined by casting the subjects into a 
2 X 2 table with pooled clinicians’ judgments 
on one axis and actual classification on the 
other axis. Fisher’s exact probability test with 
Tocher’s was used to test the 
hypothesis that the distribution of subjects 


modification 


different 


This 


in this table was not significanth 


than we would expect by chance alone 


hypothesis could not be rejected (exact proba- 
bility of tables more extreme than that found 
is equal to .126). We, concluded 


that even when our two clinicians agreed upon 


therefore 
the classification in which to place a person, 
their judgment was no more accurate than we 


would expect by chance alone 


TABLI 


AND SUI 


ALLEN WARREN 


DISCUSSION 


The data reported here that 
MMPI scores fail to differentiate between in- 
dividuals who either stay in the employment 


suggests 


of the institution, resign from the institution 
or are dismissed from the institution. This is 
true even though a 1 
using the MMPI were employed. Apparent) 
this study supports Cuadra and Reed (1957) 
(1951) in the finding that there 
is no “cook-book formula” available for using 
the MMPT as a basis for hiring hospital aides 


However, 


wide variety of ways of 


and Levine 


there are grounds upon which to 
question the results of the present investiga- 
tion. The which 


study were selected from the population of 


groups were used in this 
all of those persons hired as attendants at the 
institution. The MMPI was often, though not 
always, the criteria of em- 
ployability when these attendants were hired 
It can be that the use of the 
instrument which we 
demonstrate groups 
in the initial selection of the population from 
which the 


used as one of 
contended 
with were trying t 
differences between the 


groups were selected reduced the 
variability within the population to such ar 
that it 
differences between groups chosen from this 
Although potent argu- 
three judgmental considera- 
tions and one empirical finding which temper 


extent would be 


impossible to find 


population. this is a 


ment, there are 
its incisiveness. 

In the first place, it was frequently neces 
sary during the period of time in which the 
population, from which our samples were se- 
attendants for 
employment regardless of their MMPI scores 


( 


lected, was hired to select 


Over 25 f those emploved had been con- 
sidered unsuitable on the basis of the 
MMPI 


In the second place, the manner in which 


high 


scores 


the MMPI was used during this time, i.e., to 
eliminate from consideration for employment 
those persons with TJ scores greater than 70 
has no demonstrated validity. Also, although 
this would 
with fewer extreme scores, it 


technique result in a population 
should not, so 
far as we know, differentially affect the mag- 
subscale scores below T 70. We 


would not, therefore expect this procedure to 


nitude of 


affect the pattern analysis techniques used in 
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this study even though it may have affected 
the mean score comparisons. 

In the third place, an examination of the 
analysis of variance data in Tables 2 and 3 
suggest that the lack of statistical significance 
was not due to any overall lack of within pop- 
ulation variance. Rather all of the groups 
within the population appeared to be highly 
variable. This factor of 
course, obscure trends. 


variability in itself 
could, of 

Beyond this judgmental evidence against 
the suggestion that the present study is in- 
conclusive, there is some empirical data. Using 
exactly the same procedure described above 
for the R and S groups, a 
sample (NH) was drawn from those appli- 


selection of the 


cants who were not hired during the period 
in which this study was conducted. Lind- 
quist Type I mixed analyses of variance were 
then calculated using the subscale T 
of all 4—D, R, S, and NH Neither 
the male nor the female NH group revealed a 
profile of T scores which differed from the 
other three altitude or pat- 
This finding would seem to effectively 
it the 


nique employed during the 


scores 


groups 


groups in either 
tern 
inswer the contention th election tech- 
ime over which 


t 
luded | > inding of 


this study ranged 
significant results 

In view of these arguments, are we to con- 
clude that personality tests are of no value in 
hospital al Let 


evidence this 


the selection of 
sider the 
but als 
vestigating the same problem 
(1957 


¢ 
8) 


con 


us 


om study 


+ ' ry 
not only f 


m other 


the evidence from studies in- 
) isolated scores on 


Personality 


Cuadra and Reed 
Test 
between 


California which 


excellent fair, 


However no relationship 


differentiated 700d 


ind poor aides 


whatever was found | the predictions 


ind actual job tenure at erformance in the 
s-validation sample 
sing the MMPI and 
lest 
significant relat 
sonalit 


the Cornell Index 
Levine (1 found 


ionship between per- 


Graphomotor 51) 


y and hospital attendant efficiency” as 
determined by merit ranking 
Kline (1950), 


graphical and personality 


a self-constructed bio- 


using 
measure reported 
successful and un- 
did 


some success in screening 
However. he 


data. F 


ccessful psychiatric 1ides 


report cross-validational urther- 
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more, there to whether 
his measure tapped personality or intellectual 
This is important intel- 
lectual factors have been shown to have some 


is some question as 


variables. because 
power in differentiating good and poor aides 
(Cliff, Newman, & Howell, 1959; Levine, 
1951; Love, 1955). 

Yerburg, Holzberg, and Alessi (1951) were 
able to differentially identify 5 of “defi- 
nitely and 32% of “definitely 
poor” aides by use of the revised beta exami 
nation and the multiple choice Rorschach. 
However, they did not 
validational data. Furthermore, 
Dingman, and Tarjan (1960) have 
out the gre 


good” aides 


report any cross- 
Shotwell, 

pointed 
at problem of rating or otherwise 
delineating “good” and “poor” aides in a hos- 
pital for the retarded. 

In view of the findings of the present study, 
the findings of Cuadra and Reed (1957) and 
the general dictates of experimental method- 


} 


ology, cross-validation of empirically isolated 


predictors such as we are concerned with here 
is essential. Since the two studies which report 
that 


aide S 


persor ility tests are useful for selecting 
evidence 


be 


vould 


do not offer cross-validation 


those results must, consequently inter- 


preted as merely suggestive. It there 
fore, seem that there is no strong evidence for 
the belief the 


have been tried 


that personality tests which 


ire useful in the selection of 
hospital aids 
But 


factors do not 


to conclude from this that pe 


affect an aide’s competence is 


rsonality 


us who work daily 
We that the 


following are among the reasons for the failure 


unact eptable to those of 


in the hospital setting 


suggest 
of experimental tests to verify 


that personality variable alter 
competen e: 
iS 


which are reliable 
‘xperimental use may not be valid 
For +} 


ner in 
is retained in employment 


riteria enough fot 


ndicators 


of rice success in tance whe aide 


is otter dene ndent 


upon the availab lity of a replacement as well 
as upon hi 
2. One 


rated proficiency 


} 


{ ve 


persol vlity pattern may 
all of the 
chronic back wards and acute 


which aides 


not 


predictive of success in various set- 


tings. e.g 


recent admission wards in work 


ifferen lity 


the 


because perso! patterns may be 


necessary il different settings 
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3.. Presently available personality measures 
may not be sensitive enough to differentiate 
between the people in the essentially “nor- 
mal” populations from which employees are 


selected. 

4. Institution 
levels are not 
personality characteristics desirable in “good” 


different 
to the 


administrators of 
in good agreement as 


attendants. 

Considerations of the sort just offered are 
often included in reports of “nonsignificant” 
research, i.e., research in which the experi- 
mental hypotheses are not verified. Frequently 
they are taken to mean that the experimental 
hypothesis was not adequately tested and, 
therefore, need not be rejected. This is a 
justifiable implication. However, the present 
investigators would like to tentatively ques- 
tion the advisability of retaining a hypothesis 
which has repeatedly been subjected to tests 
which were not adequate to either confirm 
or reject it. We realize that it is difficult to 
decide at what point a hypothesis should be 
judged untestable. would like 
to point out that research such as that re- 
ported and discussed herein is concerned with 


However.. we 


the empirical isolation of response-inferred 
predictor variables. Such studies take no ad- 
vantage of existing theories about what sorts 
of people should be successful in the situations 
predicting. 
Furthermore, such the 
fluence of stimulus variables such as preplace- 


to which we are interested in 


studies neglect in- 
ment training. Perhaps our obvious failure to 


predict in this situation is partially due to our 


AND 


Sur ALLEN WARREN 
failure to work from a theoretical framework 
and our neglect of stimulus variables. 
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NOTE ON THE OPTIMAL LENGTH FOR 
VISUAL INTERPOLATION 
K. F. H. MURRELL 


Unviersity of Bristol 


When Ss were required to interpolate a scale division into tenths, Churchill 


was unable to confirm the 
agrees with other 
basis of a forthcoming 

designs based on the BSI 
floor. Possible causes for the difference 


British 


Churchill 


(1959) has reported that, with 
tenths, 
“law” that 
optimum scale spacing could be specified in 


scale division! in 


interpolation of a 
he was unable to substantiate the 


terms of visual angle. He quite rightly makes 
that this applied 
conditions tested.” It not  in- 
frequently happens, however, that when being 


the reservation finding 


“under the 


discussed in relation to practical applications 
a finding of this kind loses 
and becomes accepted as a general principle; 


its qualification 


in case this should happen with this finding it 
should be pointed out that it is directly con 
trary to that of Murrell, Laurie, Mc- 
Carthy (1958), who used the method of 
Kappauf, Smith, and Bray (1947) with dials 
?, 3, 4, 6, and 8 inches in diameter at dis- 
of 6, 9, 12, 18, and 24 feet. Their 


combinations of dial sizes 


and 


tances 
results show that 
and 
visual 


reading distances which vive the same 


similar degree 
angle of 10 
minutes performance deteriorates as the loga- 


angle are read with a 


of accuracy. Below a_ visual 


rithm of decreasing visual angle, while above 
10 minutes there is little or no improvement 
in performance at about 98° correct read- 
ings. This figure has been confirmed by Jones 
Murrell’s 


smaller dials and closer distances. The 


with 
results 
can be expressed graphically in a number of 
but they 
all confirm the finding that the optimum scale 
spacing 


who repeated experiment 


ways (as were Churchill’s results) 


can be specified in terms of visual 


angle for the distances and sizes used. This 
These terms are defined in British Standard No 
Relating to the 


> of Measuring Instrument 
C. Jones, 


Glossary of Term Perform- 


personal communication, September 


“law” of constant visual angle 


In this his result dis 


research on interpolation into fifths, which has become thi 
Standard 
recommendation have 
in findings are discussed 


Dial 


shop 


Institution recommendation 


been validated on the 


formed the basis of a British 
Standards Institution 
“Graduation of Industrial 
Quantitive Measurements” 
sued shortly, and has 
experiments with dials conforming to this 


recommendation using workers in the chem 


finding has 
recommendation on 
Instruments for 
which will be is- 


been confirmed by 


ical industry as subjects (Spencer, 1962) 
In view of this. possible causes of the dif- 
ference seem worth examining: 


1. Churchill’s subjects 
tenths, Murrell’s in fifths 
sidered that in the 
polation in fifths is the maximum 


interpolated in 
It is generally con 
practical situation inte) 

2. The subjects in Churchill’s experiment 
were given limited exposure times of .5 and 
25 which are substantially less than the time: 
which subjects would take if they made read 
ings with no time limitation 

3. Murrell et al 
Fe ¢cer , } ° a 
of 2.5 feet and noted that when interpolation 


also used a reading distance 


was in tenths proportionality seemed to break 
down (loc. cit. p. 189); in this they agree 
with Churchill. 

4. Churchill used a single 
interval, whereas Murrell 
sector scale with 20 scale 
which would be slightly curved. It is not 
thought that interpolation is influenced by the 
curvature or scale interva! 
On the other hand, since there was no choice 
of scale interval in Churchill’s experiment, his 
task should be the easier. 


straight scale 
et al. used a 27( 


intervals, each of 


otherwise of the 


Taking the two sets of results together. 


the following conclusions can be drawn: (a) 
that for the scales interpolated in fifths, the 
optimum scale spacing can be specified in 
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terms of visual angle; (6) that the accuracy 
with which scales can be interpolated in fifths 


is substantially greater than when interpolated 
in tenths; (c) that the law of constant visual 
angle may break down if dials are read from 
very short distances 


and scale intervals are 


large and if interpolation is in tenths. 
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GROUP CREATIVITY UNDER CONDITIONS OF 
SUCCESS OR FAILURE AND VARIATIONS 
IN GROUP STABILITY’ 


ROBERT C. ZILLER, RICHARD D 


Center for Re 


This study deals with the creativity 


failure 
member, and a 


or removal of a 


group composition was maintained 


BEHRINGER, ann JACQUELINE D 


earch on Social Behavior, 


followed by 4 variations of group stability: 


control 


GOODCHILDS 


University of Delaware 


groups with a history of success or 


the addition, replacement, 
which the 


original 


condition in 


Following an induction of group success 


wx failure, 1 of the 3 variations of group composition change or the control 


condition was effected 
either 2, 3, or 4 persons, 
following th tability 
The creativity task involved 
Evening Post cartoon 
p 5) that 


were 


groups 
more creative than the 
Guilford (1959), a distin- 
quishing characteristic of the creative process 
is “divergent intellectual production” or the 


According to 


generation of a variety of mental responses. 
This with the 
intellectual productivity of laboratory groups 


investigation deals divergent 
failure followed 
by any one of four variations in group stabil 


with a history of success or 
ity: the addition, replacement, or removal of 
a member, and a control condition in which 
original main- 
tained. The study also investigates the group’s 
probability of ittainment 
under the same experimental conditions. 


group composition was 


perceived goal 

Many social scientists have long held that 
creativity flourishes in societies where there is 
a continuous flow of new members and _ is 
curbed in societies from which strangers are 
restricted. The usual rationale for this prop- 
that highly 
specialized conceptual frameworks and rigid 
limitations on the kind of 
acceptable in a given 
1947). (1927) 
that in a horizontally immobile society with 


osition is isolation results in 
deemed 


(Redfield 


behavior 
situation 
Sorokin however, proposed 
its “‘permanent and monotonous environment, 


there is little incentive for invention.” Can 
read be 
meet- 


The study was sup 


1 This report is an extension of a paper 
fore the American Psychological 
ings in Chicago, Illinois, 19¢ 
ported in part under Contract NONR 85(02) with 
the Group Psvycholog) Office of Naval 
Research 


Association 


Branch, 


depending upon the 
conditions, all groups were 
composing as many 
is possible within 


a given tim 
} 


vhich expt rience 


stable groups 


The 64 experimental groups were initially composed of 


experimental conditions; but 
comprised of persons 
captions for a Saturday 
limit. The result 


membership changes (open 


s indicate 
groups 


(closed groups) 


this theoretical framework involving a societal 
small 


frame of reference be extended to 
groups? 

Indeed, the only empirical evidence sup- 
porting this hypothesis involves small research 
teams in industrial laboratories (Shepard 
1955). The results suggest that group produc 
tivity, creativity, and enthusiasm are nega 
tively related to length 


group members 


f association of the 
Conversely, and again with reference to 
Park and (1921) 
have suggested that isolation is conducive to 
creativity. They propose that contact with a 
vast number of other ideas tends to encourage 
imitation. Recently, Rose and Felton (1955) 
tested this latter hypothesis in “laboratory 


large social units, Burgess 


cultures.” The number of individual responses 
Rorschach 


measure of creativity. It was concluded that 


to selected cards provided the 
invention is curbed in open sot ieties, and that 
culture borrowing increases immediately fol- 
These 


since 


lowing experiences in new societies. 
conclusions may be disputed, however, 
the instrument purporting to measure crea- 
tivity lacks empirical validation and the ex- 
periment actually involved small groups rather 
than societies. 

In the study it 
hypothesized that small open groups are more 


present laboratory was 
creative than small closed groups. Open groups 


are defined (Ziller, Behringer, & Jansen, 1961) 
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as groups in which the members are tempo- 
rarily interrelated owing to unscheduled re- 
placements, removals, or additions of mem- 
bers. Closed groups are defined as groups in 
which group composition remains unchanged. 

It was proposed that under open group 
conditions where the membership is in a state 
of flux, the members are less interpersonally 
oriented. Since membership is transitory in 
open groups, interpersonal orientation, in- 
dividual and role structure are 
salient (Sorokin, 1927, p. 522). Energy would 
only be dissipated in an endless effort to 


status, less 


establish these relationships. Faced with a 
task demanding a variety of mental responses, 
it was anticipated that criticism arising from 
status striving would be open 
contrast to Thus, 


by minimizing captious criticism and accept- 


minimized in 
groups in closed groups. 
ing without prejudice ideas and suggestions 
from all members and 
status, conditions conducive to creativity are 
induced (Stein, 1953). 
Partial this 
theoretical framework is suggested by 


regardless of power 


support for hypothesis and 
Tor- 
rance’s (1955) study involving group decision 
making in permanent and temporary three- 
man found that dif- 


ferences were less prominent in temporary 


groups. It was status 
groups. Moreover, it was reported that tem- 
porary groups submitted decisions superior 
to those of permanent groups. The personnel 
of the permanent groups had worked together 
for several months. Members of the temporary 
groups were from similar permanent groups 
(military air crews) but were reassigned to 
new groups for the purposes of the laboratory 
experiment. Assuming that temporary groups 
are a special case of open group conditions, 
the results of this study offer a 
support for the open-closed group theoretical 


degree of 


framework with reference to group decision 


making and suggests an extension of the 
framework with reference to group creativity. 

\ second independent variable concerned 
the group history of success or failure. Previ- 
(Deutsch, 1958) 


that perceived probability of goal attainment 


ous research has indicated 


is positively related to initial experience of 
group success. The present experiment is con- 


cerned with the interaction of the success- 


BEHRINGER, AND J. D. 


(ZOODCHILDS 


failure variable and variations in group sta- 
bility. Specifically, it was hypothesized that 
following a history of failure, perceived proba- 
bility of goal attainment is greater in open 
groups; following a history of success, per- 
ceived probability of goal attainment is 
greater in closed groups. 

In the second hypothesis it was assumed 
that the failure atmosphere or attitudes con- 
stituting low evaluation following 
group failure are dissociated more readily if 
the membership is altered. Furthermore, it 
was proposed that the disruption of group 


group 


basis for 
dividing the group history into two time seg- 
ments—the and the future—and for 
compartmentalizing failure with the past. In- 
deed, following failure, a change of any kind 
may provide a plausible basis for dissociating 


continuity provides a convenient 


past 


the past and perceiving the future as a “new 
deal” or a new group era relatively free from 
disturbing reminders of past unpleasantness 
(the New Year’s Eve phenomenon). Change, 
per se is efficacious. Any one of a variety of 
changes may provide a rallying point or a 


convenient defense mechanism which serves 


a therapeutic function for groups embar- 
rassed by memories of failure. Examples of 
the proposed group dynamism abound at the 
“common sense” level of psychology: the re- 
football teams 
with dissappointing records; and, in general, 


the expressed need for blood” in a 


placement of the coaches of 


“new 
dyspeptic organization 

On the other hand, it was proposed that 
groups with a history of success attempt to 
minimize change in order to preserve the 
identity of the group and the members’ per- 
ceptions of the nebulous but accepted “win- 
ning combination.” It is difficult for 


as for professional social 


group 
members as _ well 
scientists to define the basis of group success. 
Thus, in the event of group success, it may 
be assumed that the group members will tend 
to adhere in every detail to the previously 
successful pattern of group behavior. Since 
this is impossible under conditions of mem- 


bership change, a change in group composition 


may be perceived as potentially corrupting or 


as a threat to the group’s rather tenuous but, 


nevertheless, successful pattern of behavior. 





Group CREATIVITY 


METHOD 
Subje cts 


the freshman 


classes at the 


students from 
education 
participated in the labora 


male 
physical 


A total of 192 
and 
University of 


sophomore 
Delaware 


t \ 
ory 


experiment 


Procedure 


At the 


ple were 


outset, the 64 groups comprising the san 


either two, three, or four 
} 


persons depending upon the experimental variation 
The variations in group stability 


compo ed ot 


of group stability 
included the addition of a 
group, the 
group, 


new member to a two 


person removal of a member from a 


the replacement of a member in 


member 


1our-person 


a three-person group by a and a con 
trol condition in 
group remained intact thr 


The first three conditions operationally defined 


which original three-person 


oughout the one-hour ses- 
sion 
condition (the control) 
Following — the 


experimental 


open groups; the fourth 


closed group 


defined as a 
group 


was 
composition, all 


thre rsons 


changes in 


groups were composed of 


T wo groups ol subjects performed si ultat ou 


in separate rooms. In collecting dat 1 “replace 


ment” group was paired with another replacement 


group; an “addition” group was paired with a “sub 


traction” or “removal” group; r a control group 


vas paired with another control group. By 


group problem solving experiences 


controlled 


means the 


newcomers and regular 


For 


man 


a degree, and data collection was expedite 


member sé lected to leave the four 


became the third 


example the 
group 


group. The member t r 


member of the 
trans! 


removal 
iddition rred was 
elected randomlys roup on the basis of a 
igh card draw 


st, the Meier Art 


item and ( sk which re 


groups were 
Judgment 
quired a 
a briefly 
task 


position change or the control condition was effected 


group estimate t numb of dots on 


exposed slice r completion of the se 


ond one of the three variations ot group con 


The success-failure induction followed 
manipulations, th 


the 


Following thes« imental 


creativity task was 


expr 


presented. Finally, subjects 


completed a concerning their attitudes 


toward the group ind 


questionnall 
their stimate of group 


yrmance 


in Group Composition 


Part of the directior under the 


replacement, addition condition were 


as follows 
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1 change in group members will be made 
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has worked on these 


group an join another group 
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Same 


the other group’s members 


AND GROUP 


STABILITY 


fashion as the 
member leaving this 

Addition: Before ‘ 
change in group membership will be made. We 
add the 
This person being added has been working on these 
same tasks with next 
The person was selected at random for that group 
to join this group 

Removal 


change 


random 


same 


group 
1 


working on the next task i 


are going to another person to group 


another group in the room 


working on the nex 
will be 


Before 
in group member hip 
are going to select at random 


the group and joir 


one of 


} 


t} group whi 


another 


working on these same tasks 


T he 


removal 


additior and 


a high ¢ 


selection of the replacement 
was decided by means of 
among the 


playing cards 


members 


The Success-Failure Inductions 


ups were informed that 
Art Judgment 


task wrong 


was 


dot-estimate task the group per 


than 1 of the groups who 


ticipated in the past that 


out ot 1 
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ive previously 


pertormed better group.’ Similarly 


cesstul tl 
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omewhat amusing 
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minutes 
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This 
“ideational 
total 


rate 
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score was presumed to be a me 


fluency.” In arriving at the 


heet isure ol 


second score, 


the population of suggested cartoon caption 


wert 1 according to a five-point scale of humor 


rhe ratings were forced into, approximately, a nor 
ribution rater reliability of th 


roups scor 


kind have 
nality 


dejecte 


Guilford is devices for me 
The cartoon depicts a sculptor seated in 


attitude at 
( ompletec 


almost 
gentle 


the base of an 10rmous and 


statue of an elderly prestigeou 


man. A horrendous vertical fissure splits the work 


from head to base. A second man is pictured stand- 
ing beside the 

T he se 
communications 


Merrifield of the 


artist 

measures were suggested through pe! onal 
with J. P. Guilford and P. R 
University of Southern California 
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f the listed 


record sheet. This latter core was presumed to 
Wilson, Guilford, & 


of the rati caption on the group’s 
a measure Of group originality 
Christensen, 1953) 

\ measure of perceived probability of goal attain 
ment was obtained from the following questionnair¢ 
item: How 
quality of these ideas your group created 


toon with those cr 


number and 
about the 
Bette: 


would you compare the 


other groups ? 


than what percentage other 


groups? ( 
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m rang 


rhe alternative lor ne ove itt 
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1 to 90° in interval | The sun 


members’ weighted responses provided a 
rating ol ¢ 
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concerning group satisfaction wet 
a) My opini vas ays given the utmost cor 
sideration mely tense throu 


were fur d) If 
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With regard to the originality component 
of divergent thinking, the differences among 
the four variations in group stability 
statistically significant (/) - 
groups received higher originality scores than 
see Tables 3 & 4). 
Open groups also submitted a greater num- 
fluency) than 
see Table 5). 


were 
.05); and open 


closed groups (p < .05; 


ber of captions (ideational 
closed groups (p < .05; 

In order to test the second hypothesis, it 
fractionate the 
variance involving the group stability and 
variables remove the 
interaction component involving the success- 


was necessary to interaction 


success-failure and to 


failure and open-closed variables (Snedecor, 
1946, p. 404). The results were not statisti- 
cally significant. 

DISCUSSION 


Hypothesis 1 states: 
creative than closed groups. The hypothesis 


Open groups are more 


was confirmed both with regard to ideational 
fluency and 
tivity 
theoretical 


yriginality components of crea- 
2, 3, & 4). The original 
framework assumed that there is 
less interpersonal constraint in open groups; 


(see Tables 


conducive to 

explanations 
For 
example, a change in group membership neces- 


a condition presumed to be 


creativity. Several alternative 


deserve at least cursory consideration. 
sarily presents a novel visual or social field. 
It has held (Glanzer, 1953; Mont- 


gomery, that the initial reaction to a 


been 

1952) 
novel stimulus is one of orientation, explora- 
tory behavior, and renewed activity. Thus, a 
change in group composition may have stimu- 
lated and augmented interaction and 
resulted in increased productivity. Unfortu- 


group 


nately, however, group interaction was not 
recorded, thereby precluding a test of this 
intervening group process.* 

An examination of the experimental direc- 
tions pertaining to changes in group compo- 
sition suggests a more parsimonious explana- 
tion of the results. It 


directions to the open groups called their at- 


was observed that the 


tention to another group which was working 
simultaneously on the same problems. The 
closed groups knew there was another group 
a subsequent experiment now in 


that 
change in grou 


‘Evidence from 


indicates interaction 


following a 


the analvsis stage group 


rate increases p com 


position 
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in the next room who were also being studied 
but they did not know they were involved in 


a similar experiment. Thus, an intergroup 


competition or comparison factor may have 
been introduced unwittingly into the experi- 
mental design. In these terms the results of 
the present experiment suggest that groups 
whose performance is to be compared to an- 
than 
groups which perform without suggesting in- 


other group tend to be more creative 
tergroup comparison. 

This latter interpretation suggests social 
facilitation at the group level. Just as Allport 
(1924) found an increase in scores from the 
individual to the “together” situation; so in 
the present experiment an increase in per- 
formance was found from the absolute group 
group) to the relative 
group conditions (open group). 

Hypothesis 2 states: 
failure, perceived probability of 


conditions (closed 
following a history of 
goal attain 
ment is greater in open groups; following a 
history of success, perceived probability of 
goal attainment is greater in closed groups 
The hypothesis was not supported by the re- 
sults Table 1). 
single change in group composition did not 
influence the in- 
group 


(see Perhaps, however, a 


have sufficient impact to 


duced perception of competence. A 
larger number of changes in composition is 
suggested. 


The 


scarcely 


limitations of this experiment need 
be emphasized. While social scien 
tists have speculated extensively concerning 
the first hypothesis in particular, there is little 
Indeed, alternative theoretical 
offered with regard to the 


findings of this study. Experimental investiga- 


agreement 
frameworks are 


tions of the topic are few; thus, comparative 
analyses are limited. Finally, the present ex- 
periment attempted to telescope time under 
the controlled conditions of a laboratory set- 
ting. Yet, it must be 
the present laboratory experiment Support, to 


those of 


noted that the results of 


a degree, Shepard’s (1955) field 


study which was described earlier. 
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THE PREDICTION OF JOB PERFORMANCE’ 
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\ battery of noncognitive tests was developed t ro prediction Navy 
enlisted men’s performance evaluations. Reported are the results of concur 
nt validity study and 2 follow ip studies with inte Is o 1d 30 months 
verformance evaluations 
ind 117 nuclear power per onnel 
st were independent of the Navy 
with the exception oO! the speeded clerical coding test; 
efficient in identifying men categorized as Below Averi 
c) tests attempting to measure persistence beyond 
decisivene ind lack of insolence yielded significant p1 


Composit ilidities about .40 were obtained in 


Like most large organizations, the Navy _ in detail elsewhere (Glickman & Kipnis, 1961: 
requires that on-the-job performance evalua- Kipnis, 1960). Briefly, it was assumed that 
tions be made of its personnel. These ratings both the processes of selection and of training 


are made semi-annually for enlisted men and _ serve to reduce noticeable differences in tech 
are used in a wide variety of personnel pro- nical ability among men and increase the 
cedures such as promotion, counseling, and supervisor’s attention to motivational chai 
selection for special duty assignments. Again, acteristics of his subordinates. It was further 


like most large organizations, the Navy has assumed that a supervisor is most influence 
not had a great deal of success in developing by behaviors in subordinates that show sup 


predic tors of these evaluations. port for himself and his goals. By Supportive 

For the most part, success of selection and behaviors are meant behaviors that reflect a 
assignment procedures for enlisted men has willingness to accept the supervisor’s influence 
been validated in terms of attrition and or which promote the supervisor’s confidence 


achievement levels at Navy technical training in the subordinate’s ability to carry out the 
schools. This procedure implicitly assumes — tasks set for him. 

that men who demonstrate the greatest pro- These general assumptions then provided 
most ade- the rationales for developing the noncognitive 


ficiency at school will perform 
The results of two validity studies with 


quately on the job. However, recent evidence _ tests. 
(Mackie, Wilson, & Buckner, 1954; Swan-_ intervals of approximately 14 months and 3 
son, 1955) shows that neither scores of the months between testing and evaluations o 
Basic Test Battery (BTB), the Navy’s meas- performance » given in this report 

ures of general ability, nor grades in technical 

training schools, are highly related to per- 

formance evaluatiot Appare! \ the selec- Predictor 

tion of those who will be most ; » in school 
does not insure the selection of those who 
will accomplish what their su iors expect 
} 


of them on the jo 


Presented here are the re s of an attempt ceptual and 


to predict on the-job perform ince eV iluations experiment: 
of Navy enlisted men with a battery of non- sapclasparneigiess 


2 : ; , i i n approac 
cognitive tests. The thinking underlving the ; 


j Hand 
developm« nt of the tests has been presented self-generat 
going” bevond nimum performa “quot 
1 The view p ( iecessarily rep tiring tasks consists of sequentially nun 


resent the of th d Stat bered boxes in which Ss penci » tally marks 
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RESULTS 
Pilot Study 


In 1956 the experimental battery was tried 
out with a group of 125 Navy Aviation Ma- 
chinist Mates, Third Class (Kipnis & Glick- 
man, 1958). Official performance evaluations 
were obtained from records and unofficial 
evaluations were collected from each man’s 
supervisor. The tetrachoric correlation of the 
two was .62. These two sources of information 
were summed and the combined distribution 
dichotomized at the mean to provide the 
criterion. 

Biserial correlations between the predictors 
and the criterion were computed. Correlations 
significant at the .10 level of confidence or 
better were obtained for all experimental tests 
except the Leadership Support Scale, which 
was dropped from further analysis. 

Contrary to expectations, a negative rela- 
tionship was found between the criterion and 
the Risk Scale. Seeking clarification, the Risk 
Scale was item analyzed. It was found that 
low criterion men more often described them- 
selves as starting at an early age to drink, 
smoke, play poker, dance, be interested in sex, 
hitchhike, and take overnight trips without 
their families. They were more likely to take 
dares and to report having done cruel things 
as a child and having been punished regularly 
for bad conduct in school. In general, the self- 
descriptions portrayed physically active, ag- 


gressive, somewhat reckless personalities, who 
early in life had become independent of fam- 
ily and school control and who have main- 
tained, we suspect, an independent and 
rebellious attitude toward most attempts at 
controlling their behavior. 

The discriminating items were incorporated 
into a scoring key labeled the Insolence Scale. 


First Follow-Up Study 


In 1957, the experimental battery 
administered to two recruits 
(N = 141) entering Class A Radioman School 
(Kipnis & Glickman, 1959). In this study the 
Leadership Support Scale was deleted and the 
Insolence Scale added. The subjects had been 
in the Navy approximately 2 months. 

Fourteen months after initial testing, su- 
pervisors were requested to evaluate each 
subject as to: Willingness to Work, Technical 
Competence, Respect for Authority, Ability 
to Get Along with Shipmates, and Overall 
Acceptance of the Man by His Supervisor. 
These scores were summed to give a criterion 
Evaluations obtained for 128 
subjects. Median time in current 
6 months. 

Based upon their criterion scores, men were 
put into two categories: Below Average (the 
bottom third of the distribution) and Aver- 
age or better. The biserial validity coefficients 
are given in Table 1. 
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Expe rimental tests 


Hand Skills Test 
Error Finding Test 
Number Items Hedged 
Number Items Right One Try 
Color Naming Test 
Part 1 
Part 2 
Risk Scale 
Insolence Scale 
Sports Scale 


* Significant beyond the .0S level 
* Significant beyond the .01 level 
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TABLE 2 


BISERIAL CORRELATIONS BETWEEN 


NUCLEAR—POWER 


Experimental tests 
} 


Performance Area 7 Er 
Hand 


Skills 


Maintain Equipment 
Operate Equipment 
Willingness to Take on Hard 
Sound Ideas 

Military Appearance 

Stay Calm in an Emergen 
Work without Supervision 
Get Along with Shipmates 
Overall Effectivene 


05 level 
O1 ke 


ant beyond the 
ant beyond the 


Signific 
signifi 


Significant prediction was obtained with 
the Hand Skills Test (r = .30, p < .01), the 
Number Right Key of the Error Finding 
Test 05), the Insolence Scale 
(7 24, p:; CLER 
p < .05). Equally weighted combinations of 
the three experimental tests (Guilford, 1950, 
p. 463) yielded a composite correlation of .40 
with the criterion. The addition of CLER to 
this composite raised the validity to .41. 

None of the tests of the BTB correlated 
with the experimental battery, with the ex- 
ception of CLER. This latter test correlated 
significantly with the Hand Skills Test 
.22), the Number Right Key of the 
Error Finding Test 38), and Parts 1 
and 2 of the Color Naming Test (r 54 


} a2. 
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(7 
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LF 


05) and (7 


(7 
(r 
c 
Second Follow-Up Study 


In the winter and spring of 1957-58, the 
noncognitive battery was administered to 261 
subjects entering the United States Naval 
Nuclear Power School, ranging from Seamen 
to Chief Petty Officers (Kipnis & Glickman, 
1961). Because of limited testing time, the 
Insolence Scale and the Sports Scale were not 
administered. CLER scores were not available 
for this sample. 
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Thirty months later, a record search 
located 124 men aboard nuclear powered sub- 
marines. Forms were sent to the supervisor 
of each man requesting that he be evaluated 
in: Ability to Maintain Equipment, Ability 
to Operate Equipment, Willingness to Take 
on Hard Duty, Sound Judgment and Ideas, 
Military Appearance, Ability to Stay Calm in 
an Emergency, Ability to Work without Close 
Supervision, Ability to Get Along with Ship- 
mates, Overall Job Effectiveness. Evaluations 
were received for 117 subjects 

Product-moment intercorrelations of the 
nine performance scales ranged from .42 to 
.90, with a median of .66. 

In each performance area, two groups were 


formed: Below Average (approximately the 
bottom third of each distribution) and Aver- 
age or better. The biserial validity coefficients 


Table 2. 

It can be seen that the noncognitive tests 
were significantly related to performance 
evaluations. The Error Finding Test (Num- 
ber Right Key) predicted all components of 
performance, with the exception of Military 
Appearance and Getting Along with Ship- 
mates. The seven significant validities for this 
test ranged from to with a median 
validity of .26. The number of items hedged 


are given in 


we 0, 
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on the Error Finding Test also gave significant 
validities in the expected negative direction, 
but this key was generally less valid than 
the Number Right Key (i.e., items 
that were not hedged). 

The Hand Skills Test most clearly pre- 
dicted evaluations of technical performance 
factors (Maintenance and Operation of 
Equipment), although significant validities 
were also obtained against evaluations of Get- 
ting Along with Shipmates and of Overall Job 
Effectiveness. 

Parts 1 and 2 of the Color Naming Test 
were both significantly related to the per- 
formance evaluations; each part predicting 
differing components. Part 1 tended to predict 
more than Part 2 
to have higher validities, 
was the more valid 


correct 


components and: also 
tended 
that it 
situation. 

BTB 
lationship with the evaluations. Of the 27 
possible correlations between the three tests 


suggesting 
score in this 


scores exhibited practically no re- 


of the BTB and the nine performance scales, 
only one significant prediction was obtained 
between GCT and Ability to Maintain Equip- 
ment (7 = .25, p: 

Correlations of the BTB with the noncogni 
tive tests were nil 


05). 
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Equally weighted combinations of the valid 
noncognitive tests yielded composite correla- 
tions with the various performance areas 
ranging from .30 to .45, as shown in Table 3. 
The best prediction was obtained against 
evaluations of technical components of per- 
formance (Ability to Maintain and Operate 
Equipment) as well as against evaluations of 
Overall Job Effectiveness, which would of 
course embrace the foregoing. In general, the 
most efficient two-test composite was: Error 
Finding plus Hand Skills. The relatively high 
correlation between the Color Naming Test 
and the other two experimental predicators 
limited its contribution. 


Battery Reliability 


The following estimates of test reliability 
were derived from a sample of 100 enlisted 


submariners, who were tested upon entrance 
to enlisted submarine school. Corrected split- 
half reliabilities for the Risk Scale was .76, 
for the Insolence Scale it was .50, and for the 
Sports Scale it was .83. Subsequent revisions 
of the Insolence Scale have added extra items 
which should increase reliability. 

The Hand Skills Test has three parts of 
4 minutes each 
between the three parts was .86, 


The average intercorrelation 
indicating a 
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high degree of individual consistency in taking 
the test. Because the Color Naming and Error 
Finding Tests are highly speeded, no split-half 
reliabilities were computed. A correlation of 
Parts 1 and 2 of the 
Naming Test may be taken as a lower-bound 


.67 between Color 


reliability estimate. 


DISCUSSION 


The findings of 
belief that 
can make a practical contribution to the pre- 
performance evaluations. Of the 
original six experimental tests, the Hand Skills 
Test, 


yond 


the two studies support the 
noncognitive tests of motivation 


diction of 


an attempt to measure persistence be- 


minimum standards, and the Error 
Finding Test, 
vielded 
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.75 was obtained between supervisors’ evalu- 
ations of Willingness to Take on Hard Duty 
and of Sound Judgment and Ideas. This high 


overlap between scales prohibits ascribing 


unique meaning to a one trait evaluation 


and consequently, does not provide the fine 
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mining construct validity 
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Error Finding 
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ITEM SORTS VERSUS GRAPHIC PROCEDURE FOR 
OBTAINING THURSTONE SCALE JUDGMENTS ' 


LILA CORKLAND SIEGEL 


anp LAURENCE 


SIEGEL 


Miami University 


An investigation of the stability of 


median 


and Q values computed from 


graphically derived and from sorted judgments used in attitude scaling by the 


method of equal ] 


ippearing intervals 


statements using a 9-point graphic 
the identical 


Graphic 


sequence oO! statements 


judgments 


unambiguous ] 


items. Correlations 


items were .99, .74, 
, and 1.5 


I he SC 


effect of the 


garding specific 
The 


findings were 


below 2.0, 1.7 
correlated 
heightened focusing 


respectivel 


One widely used technique for developing 
attitude scales is based upon the method of 
equal-appearing intervals. The procedure, as 
originally described by Thurstone and Chave 
(1929), requires that each of the preliminary 
statements be printed on a separate card. This 
deck of cards is then presented to judges 
with instructions to sort the statements into 
a continuum 
The resultant 
each statement 
the 
value,” 


points on 


piles corresponding to 
of equal-appearing intervals 
distribution of judgments for 
median 
and the 


is summarized by computing 
which is taken as the “scale 
interquartile range of 
“ambiguity.” Items of and 


with appropriate scale values are selected from 


((/) as a measure 


low ambiguity 
the preliminary pool for inclusion in the final 
form of the scale. 

The requirement that judges sort individual 
statements into piles necessitates the use of a 
rather large work surface and therefore often 
precludes obtaining judgments simultaneously 
from Furthermore, this 
judgmental process must be followed by the 
hand tallies sum- 
marizing the distribution of judgments for 
each preliminary statement 

Attempts have been made to circumvent 
either or both of these difficulties by utilizing 
some variation of graphic ratings rather than 

The research reported herein was a miscellaneous 
by-product of a study of the Instructional 
Gestalt supported from the Office of 
Education, United Department of Health 
Education, and Welfare 


groups of persons 


laborious preparation of 


larger 
by a 
States 


grant 


continuum, A 
presented as a 
tended to yield higher Q values than 
expressing 
and 
medians derived from the 
interpreted in 


sorting procedure for 


group of Ss (N i judged 135 
group (N 412) sorted 
deck into 9 piles 


rt 
sorts 


2nd 
card 
for relatively 
final decision re 


“acceptable” 


confluence of 
53 when Q values wert 
procedures 
the light of a 


presumed 


judgments 


making 
sorting. The judges are presented with a list 
of the preliminary 
to one a graphic continuum 
Since actual sorting is not required, this tech- 
nique for readily 
amenable to group administration. This gen- 


statements and instructed 


rate each on 


obtaining judgments is 
eral procedure has been reported by several 
and Hevner 
(1943) 


In order to 


including Seashore 
Ballin and Farnsworth 
and Kenney (1946) 
simplify the preparation of judgmental sum 
maries, Webb (1951) had his subjects record 
IBM 
sheets. The judgments so obtained are easily 
tabulated by the graphic item 
counter attachment to the IBM test scoring 
machine 


authors 
(1933) and 
Edwards 


their judgments on standard answel 


means ol 


The critical issue in studies comparing item 
sorts with graphic 
Thurstone scaling judgments is the affect of 


procedures for obtaining 


judgmental procedure upon the resultant 
number of 


1946: Farnsworth 


median and O values. A studies 
(Edwards & Kenney 1943: 
Webb. 1951) involved with a 
graphic procedure of results obtained years 
earlier with the These 
studies uniformly vielded high correlations 
(.95 


replication 


sorting procedure 
.99) between the medians obtained from 
the two methods in spite of the lengthy time 
sorts and the 
Occa- 
median values for particular 


intervals between the original 
replication 
shifts in 
statements were noted, but it was impossible 


with graphic procedures 


sional 


to ascertain whether these shifts were a func- 
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the 


elapsed time between the judgments originally 


tion of the judgmental procedure or 


and subsequently secured. Edwards (1957) 
concludes in summarizing the results of these 
tudies that the 
ecuring judgments, the relative ordering of 


statements along the psychological continuum 


regardless of method for 


remains quite stable. 

The available data for stability of Q values 
obtained from sorting and graphic procedures 
are much less reassuring than for the stability 
and Kenney (1946) 
reported that QO values derived from a modi- 


of medians. Edwards 
fied graphic 
with those 
(1951) 


the two procedures of .42. 


procedure correlated only .18 
Webb 
a O value correlation for 
If these 
accepted at face value, it would appear 


obtained from sorts, and 
reported 
data are 
that 
although graphic judgmental procedures are 
simpler to administer and summarize than 
the 
form of the 
basis of these procedure 
ferent 
} 


nowever 


sorted 


f 


selected for inclusion 
on the 
dif 


anted 


item sorts, items 


attitude scale 


uite 


in the final 


Such a conclusion i t wart 


because of the ¢ between 


the phically 


judgments 


derived judgment | \ ; in each in- 


tance). Furthermore. study was 


yt intended as a test 


his correlation 


ibility of O 


was sed upon QO 


from a 


tained 5-point graphic scale 


an 11-point sort scale 


PROBLEM 


present stu Vi signed to in 

he stability of median and O values 
judgmental 
between administrations and 


1 249 
across procedures with a zero- 


time interval 


ntrolli eye } > ’ +} 
controlling for the size of ne 


con 
tinuum 

The three hypotheses investi 
the 
using a 


derived from same general 


judges graphic pri 
examine each statement some 
cally than judges sorting stateme1 


ally presented on cards. Starting 


premise it was possible to spec if\ 
tion of our anticipated findi 


We can that 


sented toa 


ngs 


assume tatement pre- 


group of juds ; a hypothetical 


“true” position on 1e equal appearing intet 


inuum judges art isked, in 


AND LAURENCI 


SIEGEI 


effect, to indicate this true position. In ideal 
circumstances, eliminating chance errors, using 
a totally and the 
best judges and instructions, all 


unambiguous statement 
conceivable 
judges ought to perceive this true value and 
rate the statement accordingly. Variability of 
the judgments under these conditions would 
be zero. 
Since this 


hardly 


ideal set of circumstances is 
ever approximated in practice, judg 
ments of a statement typically are distributed 
in some fashion. A portion of the variance 
of this i be attributed to 
“error” o1 ind is not of concern in 
the The this 
variance is attributable to imperfections in 
the 


I 
perfections in the statement 


distribution can 
“chance 
present paper remainder of 
statement and the judging complex. Im 
lead to 
statement’s 


per se 
the 
d hence constitute that 
umbiguity.” 


under- and overestimates of 


“true scale value” ar 


statement’s “true Imperfections 


in the judging complex may act artificially 


either to inflate or deflate estimates of this 
true 


We wert 


umbiguity 
sper interested in the effects 
of imperfection (i.e 

i i [ State 
char 
sorting procedures 


more 
his imperfection would 
imates of the statement’s 
rela- 
esti- 


true ambiguity wl he statement is 
tively unambiguous, () deflate such 
for moderately ambiguous statements, 


1 } 


t 
be of little consequence for highly 


nents. The rationale for these 


ambiguous state 
expectations follows 

: So 
ritical judging 


2 ‘ . ; ‘ 
Consider th itects of an un 


} 


ittitude upon the distribution of judgments 


+ 


unambiguous statement. 
isonable care ought cor- 
. : 
the 
lose to its hypothetic il 
The 
judgments would have a low O value. Less 
careful judges 


overestimate the s 


intent of tatement 


true 


ver’ ( 


cale position resultant distribution of 


how } 
nowever, 


probably under- o1 


tatement’s true scale posi 


tion by some margin. This would be reflected 
in inflated O values 
We anticipated 
trend 
] 


thi 


reversed for moderately am 


that the direction of 


would be 


ig@uoUs tatements. Certain subtleties in 





GRAPHIC PROCEDURES FOR THURSTONE SCALE JUDGMENTS 


‘hraseology responsible for a moderate level graphic procedure. The relative 
° I 

of ambiguitv would probably not be perceived exercised by the judges was 
; | . | ; ] 

unless judges examined the statement rather fluence the variability of the 


critically. Thus we expected larger computed judgments but not its other characteri 


ne expect 


QO values when such statements were judged A third hypothesis expresses 1 


carefully than when they were judged sum tion as it rel: to the other 


marily. characteristic (i.e., the median 
As the true ambiguity incre: till fur- significant for Thurstone scaling 

ther, we would expect that both critical and Hypothesis 3. The medians 

noncritical judges would be similarly led | items judged by the 


the phraseology to scatter their judgment yrocedures with a 


} 


| 
vielding distributions with large Q values. the application of 


In the light of our original premise about 
= 


the difference in care exercised by iu 


| 


ca. } ] } i . PROCEDI 
using the two procedures under considera 


it ined OVE led 


tion, the reasoning out 
formulation of: 

Hy pothesi I The relationship betwee 
computed O values obtained from the g iphic 
and sorting procedures will be nonlinea 
graphic procedt 
O values than the 
biguous statemen 

* 


sort for moderately 


and about the 


cidence of decisions 
or reject items judgec 
under consideration 


pothesis 1 would lack re i] unles 


differences in the computed O values resultin 
from these procedures ied 1 a large number 
of discordant decisi i 
final version of 
regard w 
Hy pothe S78 
ance level for O 
coin idence 
retain or dis 
anti ip ited tha 
cedures would |] ret f virtually 
| statements when the permissible Q 


identical stateme 


] 


alue was moderat lefinition 


ely 
of “nermi sible O valu ) » increasi \ 
tringent, an increasing numb 
decisions was anti ip ited 

The two hypotheses 
fect of judgmental proce 
The \ were derived fr 


sorting eads the i] | { imine 


statement more irefully han does the 
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The scattergram of O values obtained from 
these procedures is shown in Table 1 

Linearity of the regression was studied by 
comparing the computed Pearson correlation 
with the computed correlation ratio (eta). 
Pearson r for these data was .46; eta (cor- 
rected for grouping) was .71. The ratio of 
zeta to its standard error was computed as a 
test of linearity with a resultant value of 3.22. 
Hence the probability of a linear relationship 
is exceedingly low (p < .001). 

Definitions of specific levels of ambiguity 
are necessary if we are to interpret this non- 
linear relationship in accord with Hypothesis 
1. As stated assumed that the 
sorting procedure encourages a more critical 
judgmental attitude. Hence Q values derived 
sorts were taken as the best avail- 
ambiguities. 


earlier, we 


from the 


able approximations of “true” 


SIEGEL 


Statements with sort-derived QO values of 1.4 
or less were regarded as “relatively unam- 
between 1.5-1.7 as “moderately 
at or above 1.8 as “highly am- 


biguous”’; 
ambiguous”; 
biguous.” 

Using this arbitrary trichotomization, it is 
apparent that the graphic procedure tends to 
inflate QO values of relatively unambiguous 
statements. There is some tenuous evidence 
also that it tends to underestimate Q values 
for moderately ambiguous statements. AI- 
though the data for highly ambiguous state- 
ments are somewhat equivocal because of the 
relatively small number of such statements, 
no systematic tendency toward either infla- 
tion or deflation of QO values is apparent. 

It would be perfectly possible, of course, 
data quite differently by 
procedure rather 


to interpret these 


assuming that the graphic 


rABLE 1 


SCATTERPLOT OF Q 


a 
= 
a 
~ 


0.6-0.8 
1.1 


0.9 


VALUES 
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rABLI 


ITEMS As 
oF Q VALut 


IN OF 


than the sort yields a more valid approxima- 
tion of ambiguity. We would then have to 
conclude that the sort artificially inflates Q 
values for relatively unambiguous items and 
deflates QO for relatively ambiguous 
items. However, the writers find the 


values 
assump- 
tion underlying such a conclusion untenable. 

Hypothesis 2 was concerned with the prac- 
tical implications of the nonlinear relationship 
between graphically derived and sort-derived 
[he issue under consideration 


QO values was 


the confluence of decisions about whether to 
accept or reject specific items on the basis of 
the O 


O values computed from each set of 
judgments. 

Three arbitrary criteria of “acceptable QO 
value” varying in stringency were employed 


for this analysis: ie., Q values below 1.5, 


below 1.7, below 2.0. The number of items 


“accepted” and “rejected” utilizing each of 
these arbitrary criteria is shown as a func- 
tion of judgmental procedure in Table 2 

It is apparent that the disposition of in- 
items (i.e 
correlated for 


dividual retained or rejected) is 
highly the 
procedures even when the 


two judgmental 
criterion of ac- 
ceptable Q value is moderately stringent. Re- 
taining only items with Q values below 1.7 


produced a correlation coefficient between 
procedures of .74. This 
to .99 when the QO value criterion was set at 
the more liberal, but still tolerable, limit of 
me) 


coefficient increased 


FOR THURSTONI 
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Thus, if one is willing to use a O value 
criterion of 2.0 in developing a nine-point 
the about the judgmental 
procedure to use ought to be predicated pri- 
marily 


ministration 


scale, decision 
ease of ad 
These 
procedure. If 


upon ¢ onsiderations of 


and tabulation. factors 


generally favor the graphic 
a more stringent 


QO value criterion, the graphic procedure will 


however, one chooses to use 


lead him to reject a disproportionate number 
of statements that would have been adjudged 
relatively unambiguous by sort, and to a 
number of 
ments that would have been adjudged rela 


cept a disproportionate state- 
tively ambiguous by sort. 

shifted for 
efiects ot 


The focus of the analysis was 


Hypothesis 3 from a study of the 
judgmental procedure upon Q values, to a 
study of the resultant scale-values. The data 
clearly do not refute the null hypothesis. The 


from 


distributions of scale values resulting 
both the graphic and the sorting procedures 
had identical means and standard deviations 
(4.9 and 2.1, 


relation between the 


respectively ). The Pearson cor 


scale values assigned by 


the two procedures was .97 
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COMPARATIVE FACTOR ANALYSES OF CLERICAL JOBS 


ALBERT B. CHALUPSKY 


One of the critical steps in conducting in curriculum development, merit rating, and 
dustrial personnel research is the identifica- job evaluation. 
tion of job requirements. Historically, this The need for information on job inter 
process of job analysis has been directed relationships has been particularly acute in 


toward individual jobs rather than broad | ld of occupational classification. Faced 


work areas. Consequently, though a great with increasing SSI for more effective 


deal of specific job information is in existence, matching of manpower resources with job 


much remains to be learned about the funda ivailable, the United States Employme 
I . 


it 
mental nature of work components and thei vice (USES) has been actively workin 
interre tionships Knowledge | SI j toward the deve opment of a truly functiona 
dimension could lend support t ariol occupationa structure One avenue ot fre 
aspects ol personnel research. SC dal has been the exploration of basi job 
tance of studying job relation nponents which could be used to character 
for determining selection re he similarities and differences 
nts has been recognized for man jobs dy reported here was 
Bingham 1935) describes a_ systen In suppo his effort 
lassifying clerical jobs based primarily 1e specific purposes of the study were to 


upon the number and kind of decisions made xplore the factors underlying worker fun 


I 


Selection of workers, as well as promotion and _ tions and knowledge requirements of a samplk 
transfer and the development of career guid leri | jobs and to assess the potentia 
ance programs, would be facilitated if commor util f cde veloping and factor analy zing 
denominators underlying individual job el ‘xperimental checklists as a basis for identify 

ment could be identified and used a the ing mmot lenominators among jobs 

basis for job classification. Moreover 

identification of work components which cut 

across job lines would appear to provide 

much broader base for futur cupational 


. ° 
researcn 1n 
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rABLE 2 
ON COMPARABI 


SUPERVISIOD 


Approves 


rransiers t« 


and the completion of various records and 
forms. Inspection of the <« functions 
associated with this factor reveals additional 
also points up some 


lerical 


activities involved and 


TABLE 3 


Balancing < 
Posting 


Ma 


CHALUPSK\ 


of the basic operations not readily apparent 
from the knowledge items. These include such 
functions as counting, physical handling, com- 
paring, and estimating. 

Comparable Factor B (Supervision). 
the checklist of knowledge 
ponents, this factor is essentially a doublet, 
having loadings of 


In 
terms of com- 
30 or above on only two 
items. These items are knowledge of Assign 
to Reviewing 
activities of other workers. Among the clerical 
functions this factor loads significantly are 
Supervises, Instructs, and Assigns. Table 

presents the items of both lists with loading 


ing duties other workers and 


of .30 or higher on this factor. 
Fa tor & 
Bookkeeping). Factor C 

putation and Bookkeep ng as 


Comparabl (Computation and 
has been called Com- 
its 
with such activities as 


listing of 


most of 


items are concerned 


arithmetic calculation data, and 


the compiling, organizing, and presenting of 


information in financial reports and records 


Table 3 presents the variables of both check 
lists which this factor loads significantly 


Items from the two lists which describe either 
higher level activities 


the 


such as supervision, or 
1 


more menial tasks of clerical work, such 
as stockkeeping. are conspicuously unrelated 


to Factor (¢ 


TABLE 4 
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TABLE 5 


OF KNOWLEDGI 


FILING 


Shorthand 

Spelling, punctuation, grammar 
Organization and layout of correspon 
Operating typewriter 

Addre ssing Correspo! 

Maintaining file 

Inserting in files 

Filling in routine fort 

Placing telephone call 


Routi 


Factor D 
Relations). 


Comparabli (Communications 
and Public Items pertaining to 
originating, analyzing, and disseminating in- 
formation and representing various types of 
personal contact with customers and clients 
serve to identify this factor as Communica- 
tion and Public Relations. The knowledges 
and functions which have significant loadings 
on Factor D are shown in Table 4. 

Stenography-Typing and General Clerical 
Factors. The factor analysis of the checklist 
of knowledge components yielded two factors, 
Stenography-Typing and Filing and General 
Clerical, which correspond to a single factor 
from the checklist of clerical functions. To 
illustrate the differences between the two fac- 
tors of the clerical knowledges list, Table 5 
presents all knowledge items correlating .30 


TABLE 6 
\L FUNCTION 


AND (G;ENERAL ( 


Key woard operation 
rranslates (shorthand 
Nonkeyboard machines 
Assembles forms, mat 
Manual tasks (e.g., told 
Inscribes (writes 
Locates positions in ! 
Delivers 


Public contact 


COMPONENTS ON STENOGRAPHY 
AND GENERAI 


TyPING FACTOR AND 
CLERICAL FACTOR 


Loading 


Filing and 


lyping rica 


Stenography 


or higher with either one of the factors. The 
Stenography-Typing factor is characterized 
by such knowledges as: Writing and tran- 
scribing shorthand; Spelling, punctuation, and 
grammar; and Organization and layout of 
business correspondence. On the other hand, 
the knowledge items with significant loading 
on the Filing and General Clerical factor in- 
clude: Maintaining systematic file, Inserting 
and removing items from systematic files and 
classifications, and routine forms 
In general the items which are heavily loaded 
in one factor show little relationship with the 
other 


Filling in 


factor. Only items correlate .3( 
with both factors. 
cerned with spelling, punctuation, and gram- 


mar and with operating a typewriter. 


two 


or above These are con- 


The single factor emerging from the check- 
list of clerical functions, Stenography-Typing 


and General Clerical, appears to encompass 
both factors of the knowledge list. The func 
tions which load significantly on it pertain 


to keyboard and nonkeyboard machines, 
translating shorthand and other codes, filing, 
and also to such lower level tasks as assem- 
bling and delivering forms and materials. A 
complete list of items with loadings of .30 
or above is included in Table 6. 


DISCUSSION 


In the interpretation of the results, two 
limitations should be mentioned. First, the 
job information utilized was derived from 
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analysis of detailed job schedules rather than 
from actual job observation and interview. 
Though these schedules were prepared by 
trained occupational analysts of the USES, 
it is possible that somewhat different results 
would have been obtained had direct obser- 
vation of the jobs been possible. Secondly, 
only those checklist items which occurred in 
at least 10 jobs entered into the factor anal- 
yses. A total of 31 items of the knowledge 
checklist and 6 items of the checklist of 
clerical functions failed to meet this criterion. 
For example, a number of knowledge items 
concerned with automatic office machines and 
with punch card accounting had to be omitted 
because of low frequencies. Similarly, clerical 
functions involving planning and recommend- 
ing or advising action were dropped for the 
same reason. Had these items occurred fre- 
quently enough to be included in the factor 
analyses, it is possible that additional factors 
may have emerged. 

Nevertheless, it is safe to conclude that the 
factors that did emerge represent meaningful 
dimensions of clerical activity as typified by 
the jobs studied. This conclusion is strength- 
ened by the strong similarity of results for the 
two factor analyses using different job vari- 
ables. Lending further support to the results 
are the findings reported by Thomas (1952) 
who analyzed clerical job components and 
identified eight clusters of office operations 
which correspond 
identified here. 

In addition, the results of the study sug- 
gest that the development, experimental ap- 
plication, and factor analysis of job analysis 
checklists can contribute to the identification 


closely to the factors 


. CHALUPSKY 


of common denominators among different jobs. 
To what extent the factors identified here can 
actually be employed as a basis for classifying 
clerical jobs or to what extent they will con- 
tribute to providing a basis for the construc- 
tion and validation of selection tests, remains 
determined. One result is certain, 
namely, that the application of factor anal- 
ysis to job oriented checklists provides a 
structure for organizing and relating the in- 
dividual elements which comprise a job. 


to be 
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TRAINING CONDITIONS, ABILITY, AND ACADEMIC OUTCOMES ’ 


HERBERT P. FROEHLICH 


Naval Air Technical Training Command, Memphis, 

To test the view that a no-setback approach in naval training is academically as 
effective as the present setback system yet more economical, 10 “no-setback” 
classes comprising 1044 basic electronics students were compared with 12 “set 
back” classes comprising 1249 students. The students were followed into 6 
advanced specialized schools. The no-setback group had a greater percentage 
of academic drops than the setback (significant at the .01 level); both groups 
had the same percentage of administrative setbacks and drops. For the most 
part, differences between course grades were contributed by low ability student 

In the more advanced schools academic differences disappeared. Several sug 


gestions were made as to how both systems might be combined so that the 


idvantages of 


Psychologists engaged in research in on- 
going educational and training situations have 
experienced considerable difficulty in formu- 
lating fruitful hypotheses as to means of in- 
creasing learning. In numerous instances 
experimental conditions which appeared quite 
promising, if not obviously effective, have 
yielded a monotonous “no significant differ- 
ence” in control group type experiments. 
Many of these “negative results” have never 
been published, yet the psychologist in this 
field who cannot give several examples of the 
above from personal experience, is rare. 

Several reasons for the phenomenon have 
been advanced. One of these states that the 
effect of the experimental condition may be 
there but our control of experimental con- 
ditions is not adequate to permit the observa- 
tion of a significant change which could be 
attributed to the experimental condition 
Similarly, it has been proposed that our re- 
search methods themselves require further 
sharpening before they will be adequate to the 
task. The thought might also be advanced 
that so much of the variance in a distribution 
of course grades, or similar measures, is as- 
sociated with aptitude and error that there is 
hardly any left that may be attributed to dif- 
ferent training methods and other experi 
mental conditions of this nature. The implica- 
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both would be maximized 


tions of this latter interpretation would be to 


emphasize: (a) selection and (0) more eco- 
nomical means of training, 


without a loss in learning, as opposed to at- 


accomplished 


tempting to increase learning by improved 
training methods. The present study is ori- 
ented toward both of these implications 


PROBLEM 


\ major question with which this study 
J i ¢ 
deals was whether a group of students in a 


basic electronics course, who were not “set- 


back” for failures within the course, could 


reach an academic level equivalent to stu- 


dents who were setback, that is, those given 
a second, third, or 
repeat. The institution of a “no-setback” ap 


proach into training was thought to be more 


even a fourth chance to 


economical than the present setback system, 
yet academically as effective. The primary 
requirement of such a system would be the 
production of a sufficient number of trainees 
adequately grounded in basic electronics to as- 
similate more advanced and specialized train- 
ing. This requirement suggested a follow-up 
into the advanced where the 
academic achievement of students from both 
experimental conditions could be evaluated. 
Since the ability level of students is not 
always of as high a quality as at present, and 


more schools 


it would be expected to bear on the efficiency 
of the 
considered. 


no-setback system, it too was 

In the report several unfamiliar terms are 
used. Standard students are trainees 
normal circumstances _ satisfactorily 


who 
under 





68 HERBERT P 


complete the course of instruction within the 
prescribed course length, that is, without set- 
backs. A setback is the repetition of a formal 
unit of instruction. Setbacks are usually given 
students who make grades below 62 on 
of the tests administered within a phase or 
who have averages below 62 for the phase. 
Repeaters are trainees who were set back one 


one 


or more times. A phase comprises a segment 
of the course in which various closely related 
topics are taught. 


DATA COLLECTION AND ANALYSIS 


The subjects in this study were students in a 19- 
week basic electronics course designed as preparation 
Ten no-setback classes 
comprising 1,044 basic electronics students preceded 12 


for more advanced training 
This order 
minimized the number of setbacks from one group that 
could conceivably be found in the other. Data collected 
included four phase grades, 
the final 

which 


setback classes comprising 1,249 students. 


the final comprehensive 
and the Navy Basi 
verbal intelligence 
Test GCT), an 
Test and a Mechanical Test 
These Navi mean ol 
a standard deviation of 1( 

Each student 
class; that is, 
back, he 


poses ot 


examination, 
Test Battery, 
test, the 
Arithmetic 
MECH) 


50 and 


average, 
contains a 
Classification 
ART), 
tests 


General 


have a wide 
considered with his 
whether or not the student 


original class for put 


was entering 
was set 
was counted with his 
who 


estimating the percentage 


that 


dropped 


were repeaters, etc. from class 


RESULTS 


Drops, Repeaters, and Graduati 


Since the two groups were identical with 


respect to ability (as measured by the Navy 
Basic Test Battery), age, and years of educa- 


tion we may be assured that any group differ- 
ences are due to the experimental conditions 
rather than tl 


to differences in the character- 
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istics of the students. The GCT means were 
about 62 with standard 
Students in both groups averaged 20 years of 
age with 12 years of education 

The results as reported in Table 1 indicate 
that the setback group had proportionally 
fewer academic drops than the no-setback 


deviations of 5.5 


group. This difference was significant at the 
.O1 level. There were no group differences on 
the number of administrative drops or ad 
ministrative repeaters. This 
since administrative drops and setbacks deal 


was expected 
only with such cases as hospitalization, and 
other strictly problems, 
should not be affected to any great extent by 


nonacademi and 
the conditions of instruction 

Although there was no provision for aca- 
demic setbacks in the no-setback group, we 
were able to get a very accurate estimate of 
the number of theoretical academic repeaters. 
This was accomplished by subtracting the 
number of drops, graduates, administrative 
and any (all 
statistics), num- 
ber in the sample. One may verify this pro- 
setback 


setbacks. academic setbacks 


known from the original 


cedure with the data for the group 
shown in Table 1. 

There 
mate for the number of repeaters in the no- 
setback group was compared with the actual 
the setback 
Table 1 points out the fact that a greater 
number of setback individuals were standard 
students than no-setback. This result was not 


expected since the number of standard stu- 


were no differences when the esti 


number of repeaters in group 


dents is a function of student ability level and 
the difficulty level of the subject matter rather 
than of whether not 


there are setbacks or 


PTABLE 1 


AND SETBAC} ATA FOR THE 


SETBACK 


trative repeaters are graduates a 
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Although not of great magnitude, the differ- 
ence was significant at the .05 level. The 
major difference found in Table 1, then, was 
that on number of drops. 

Parenthetically, a summary statement could 
be made at this point concerning costs. Taking 
into account the cost of instructional and sup- 
port personnel, maintenance and equipment, 
and student theoretical 
3.98% over the cost of operations would be 


pay, a savings of 
realized if the no-setback system were used 
instead of the present setback approach. 
During a period in which the ability level 
of students entering training is lower than at 
present, the effect of no-setbacks might pre- 
sent a major problem in getting the required 
number of men to fill fleet jobs. Under 
conditions we might expect an even 


such 
greater 
number of drops than found in this study and 
would that setbacks could minimize 
We might that with a 
decrease in the ability level of entering stu- 


believe 
such losses. suppose 
dents both drops and setbacks would increase. 
Data presented suggest what might happen 
under the above circumstances 

\t the moment, the ability level of students 


entering training is such that approximately 


20% have a GCT-ARI 


and below, and 80% 


combination of 115 
with a combination of 
116 and above. In terms of drops from the 
there 24.0% in 
the low ability group and 7.7% in the high 


setback group, would be 
group. With the evidence offered above show- 
ing a greater proportion of drops in the no- 
setback group, one might expect that the low 
ability students of a no-setback group would 
contribute an even greater proportion of 
drops than their setback counterparts. 

As to an estimate of the 
backs for 


no-setback 


number of set- 


graduating students, low 
students have 123 


students entering 


ability 
setbacks for 
training; high 
students have only 64 


every 100 
ability setbacks for 
the same number. Similar data for the setback 
group showed a ratio of setbacks to entering 
students of 94/100 for the lower ability stu- 
dents, and 61/100 for the higher ability 
students. Thus, in summary, there were no 
substantial differences in the number of set- 
backs given graduates of high ability, but 
no-setback graduates of low ability had more 
potential setbacks than setback graduates. 


Academic Achievement 


Table 2 presents a comparison of the setback 
and no-setback groups on test and phase grade 
means. Although not of great magnitude, a sta- 
tistically significant difference in final average 
is noted. On the other hand, a difference of 3.67 
grade points, significant at the .01 level, was 
found on the final comprehensive. Not only 
was the no-setback final comprehensive mean 
much lower than the setback mean, but the 
classes making up the no-setback group also 
had a range of scores which brought many 
quite below the desired quality 


It should be 


age is a composite of the four phase grade 


mentioned that the final aver 
and the comprehensive examination, whereas 
an examination 
Because of the 
nature of these averages it is not surprising 
that 
criminating 


the comprehensive is cover 


ing all areas of the course. 
the final comprehensive is more dis 
than a final 
achievement over 


average which in 


cludes many, but short 
learning periods. 

somewhat 
poorer students had 
dropped out of the no-setback group; recall 
that more drops from this 


group. In addition, the setback group means 


The differences in Table 2 are 


conservative since the 


there were 3.8‘ 


included the passing grades of students who 
were set back. That is to say, instead of aver- 
aging in the student’s actual grade made at 
the time of failure. such as 44, the minimum 


rABLI 


DIFFERENCES BETWEEN THE St K 


cy 4 ‘ ‘ ’ ' ‘ 
S} cK MEANS AND STANDARD DEv 


Test AND PHASE GRADI 
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TABLE 3 


THE SETBACK 


Low AND HIGH 


passing grade of 62 was used when the set- 


back student successfully repeated the phase. 
In the same vein, one also did not know how 
much than 62 the setback student’s 
In the « the 
mean contained the stu- 
was below 62 


greater 


second grade was ‘ase of no- 


setback group, the 
dent’s actual grade, even if it 
since a second attempt was not offered. 

But 


there 


means is 
because of little. 
When the grades of 62 for the setbacks on 
the final comprehensive were substituted by 


how great a difference in 


this? Surprisingly 


the actual failing grade, in order to make the 
setback and the no-setback means comparable 


the setback mean dropped to 77.03 and the 


It can 


great 


standard deviation increased to 8.71. 
that 


(77.19-77.0 


the change was not 
that 


be seen very 


3), and the difference 
TABLE 4 


Mi 


rH 


73.90** 


74.52 


ence was pr é 
tests in the set 


AND THE No-SETBACK GRouUP 


ABILITY STUDENTS 


Phase 


between the 
significant. 
With reference to the achievement of gradu 
high and low ability (as 
above), it may be well to consider the data oi 
Table 3. Academically the 
ability students did not do as well as high 
ability students in either the setback or the 
no-setback groups. However, the low ability 


two groups 


ates of defined 


speaking, low 


setback group suffered much less academically 
than the low ability no-setback group. Thi 
to 


were 


final 
74.03 
respectively. The no-setback con- 


is especially true with respect the 


comprehensive where the means 
and 69.02 
dition affected the high ability group, yet to 
a much than the ability 
group. Note that these data are only for 
graduates, and that the conditions making for 
conservative estimates prevail here as they 
did for the data in Table 2. 


degree low 


1 
lesser 


Folloz -Uup into the Spe ( ialize d Si hools 


Data ‘on the six specialized schools whic! 
immediately follow basic electronics were col- 
lected. These schools graduated 1,055 setback 
and 822 no-setback students. The results indi 
that, for the ATS school in 
which a newly devised test had been adminis- 
tered to setback students, the groups did not 
differ in their final averages or in the per- 
centages of setbacks given. Table 4 shows 
the final average means for the six schools 
One can that all differences between 
means found in the basic school were “washed 
out” in the specialized schools. 


cate except 


see 
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DISCUSSION AND CONCLUSION 


Results such as these should not be gen- 
eralized to other situations without regard to 
the ability level of the students entering train- 
ing and, as well, the nature of the training. It 
must be kept in mind that if this study were 
done with students of lower academic ability 
results might have pointed to greater limita- 
tions of the no-setback approach in terms of 
the number of students graduating and their 
level of achievement. Possibly if the entering 
students were of uniformly high ability a no- 
setback system could be put into use on a 
practical basis; that is, more economical than 
the setback, while maintaining a consistently 
high number of 
further training. 


graduates qualified for 

In consideration of these data a compromise 
between the setback and the no-setback sys- 
tems might be reached such that setbacks are 
used during the first two phases only. Such 
an approach would assist the student in gain- 
ing a full understanding of principles before 
moving on with more difficult subject-matter. 
With fairly easy material however, it is easy 
to see how a student might be lax in light of 
possibilities for another chance’ through 
setbacks. 

Possibly an even better approach would be 
the use of setbacks in the later phases. This 
would eliminate the poorer individuals early: 
that is, 
had 


those who would drop even if they 
another chance. The approach might 
motivate students at the outset to apply them- 
selves, and with setbacks in the later phases, 
would reward those who have proven their 
worth in the early phases. Students having 
failed tests in the early phases would not be 
with two such failures 
they would be dropped from the course. 


set back. However, 


(Received May 15, 
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No significant differences at the more ad 
vanced school level would seem to indicate 
the practicality of the no-setback system. It 
should be kept in mind, however, that in the 
basic school the no-setback system produced 
the greater percentage of drops, and it affected 
the achievement of low ability students. For 
a no-setback system to develop students quali- 
fied for further training high ability students 
must be selected. 

A combination setback and 
approach might take advantage of the best 
of both systems. For instance, setbacks would 
be given only those individuals who fail a 


no-setback 


selection 
board, really needed it. In this way, instead of 
setting everyone back, one would be taking 
advantage of the economy of the no-setback 
system without suffering 
crease in course failures. 

The lack of 
vanced schools 


test and who, in the opinion of a 


the significant in- 


differences in the more ad- 

may have arisen merely 
because the djfferences in the basic school 
were Tfot great. An on-the-job per- 
formance criterion would probably also show 


very 


no significant differences between the groups 
We found vo"reason toe believe that the lack 
of differences was a corisequence of differences 
in subject-matter between the two academic 


; we ° 
levels or due to greater instruc OF pussistance 
to former no-setback students. 


We were con- 
vinced, however, that there was merit in fol- 
lowing students beyond the phase in which the 
original experimental manipulation was car- 
ried out. With an eye toward the future, we 
would predict that the less lenient quality of 
the no-setback system, would, in the long 
run, motivate individuals to such a degree 
that they would be at least the academic 
equals of setback students, if not superior 
to them. 
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THE EFFECT OF THE 


PSEUDOVOLUNTEER ON STUDIES OF 


VOLUNTEERS FOR PSYCHOLOGY EXPERIMENTS 


EUGENE E. LEVITT, BERNARD LUBIN, ann JOHN PAUL BRADY 


Indiana University Medical Center 


30% of the student nurses promising to participate in an hypnosis study failed 
to keep the experimental appointment. When these pseudovolunteers are in- 
cluded with the volunteers, and compared with nonvolunteers, there are 2 
significant differences of 38 comparisons of variables. When the pseudovoiun- 
teers are included with the nonvolunteers, 8 comparisons, 1 of which was 
previously significant, are now significant. It is suggested that the definition of 
a volunteer may have bearing on the outcome of a study of volunteer person- 
ality or behavior, and that future study should utilize actual participation in 


an experiment as the definition 

A majority of psychological experiments 
with human subjects—possibly a very large 
majority—employ volunteer samples. The ex- 
perimenter may be aware of the possibility of 
thus introducing sampling bias, but there is 
little that he can do about it. True random 
sampling from any large population is prac- 
tically impossible. The expediency is neatly 
stated by Riggs and Kaess (1955): 


The more common procedure is to use easily obtained 
groups and to hope that the process of selection 
has not introduced which would 
make the group studied greatly different from one 
selected by random sampling, at least regarding the 
dependent variable (p. 229) 


biasing factors 


distribution of the 


Despite the potential importance of the 
possibility of sampling bias for inferences and 
conclusions, there has been only a scant hand- 
ful of investigations of volunteers and volun- 
teering behavior. About half of these reported 
fairly large numbers of personality or be- 
differences between volunteers and 
while the other half found 
only minor differences, or no differences. 

The overall picture is extremely difficult to 
differ unsystemati- 
cally with respect to a number of factors 
which could logically influence results. Among 
these are: the use of, and kinds of incentive; 
the general area of the study as it is presented 
to the potential subjects; the nature of the 
subject group itself; the relationship of the 
experimenter to the subjects; and the meas- 
ures applied to volunteers and nonvolunteers. 

Interstudy variation among these factors is 
considerable, and together with the relatively 


havioral 
nonvolunteers, 


evaluate. Investigations 


small number of studies, preclude any sys- 
tematic survey. 

Most of these factors are usually reported 
in the experimental study of volunteering be- 
havior. However, there is another factor which 
may be of critical significance, which often is 
not reported by the investigator. This is the 
validity of volunteering 

Many studies of volunteering behavior are 
designed only to identify volunteers, and 
there is no intention to obtain subjects for an 
actual experiment. An individual volunteers 


verbally, or by signing a card, or by raising 


his hand, or otherwise promises to participate 
in the study. There is often no check on this 
promise because the purpose of the study does 
not ‘require it. 

We suspect that in many studies of volun- 
teering behavior, there was a number of 
pseudovolunteers, that is, individuals who 
promised to participate, but who would have 
failed to follow through when the time came. 

In the studies of Schubert (1960), Mar- 
tin and Marcuse (1958), Himelstein (1956), 
Scheier (1959), and Howe (1960), subjects 
promised to participate but the promise was 
never checked by actual experiment. It is 
likely that there were some pseudovolunteers 
The percentage of 
pseudovolunteers was probably greatest in 
studies like those of Schubert (1960), and 
Martin and Marcuse (1958), in which the 
alleged volunteers were specifically told that 
they would be contacted later. On the other 
hand, in studies like those of Maslow and 
Sakoda (1952), Brower (1948), and Riggs 


in these investigations. 
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and Kaess (1955), a volunteer was defined as 
an individual who actually did participate in 
an experiment. Hence, there would be no 
pseudovolunteers in these investigations. 
When the promise to participate is not 
verified, pseudovolunteers are lumped with 
actual volunteers. When there is verification, 
the pseudovolunteers must be included with 
the nonvolunteers. It is reasonable to suggest 
that these two types of studies could lead to 
quite different conclusions about volunteers 
and volunteering behavior. The present inves- 
tigation was designed to test this possibility. 


PROCEDURE AND SUBJECTS 


The choices of procedure and measuring instru- 
ments were based on the intent to eventually use the 
volunteers as subjects in ongoing research on the 
personalitv correlates of hypnotically induced emo 
tions (cf. Grosz & Levitt, 1959; Levitt & Persky 
1960). The measures were selected largely because 
they appear to tap a broad range of personality 
characteristics, or were considered to be directly 
relevant to an investigation of emotional behavior 
under hypnosis. The subjects were drawn from the 
same group which has furnished most of our research 
data in this area 

The following measures, most of which furnish 
multivariate data, were administered to 76 sopho- 
more student nurses, in two group testings: The 
Guilford-Zimmerman Temperament Survey (G-Z 
10 variables); the Edwards Personal Preference 
Schedule (EPPS: 16 variables); the IPAT Anxiety 
cale IPAT: 4 variables); the Allport-Vernon- 
Lindzey Study of Values (AVL: 6 variables); group 
Rorschach (RDep: 1 variable);1 and an inventory 
designed to measure knowledge of hypnosis (HII 
1 variable). The total number of variables derived 
from the six measures was 38. N varied from 71 to 
74 for the second testing session due to absences 
ind inadequately completed forms 

After the first testing session, the students were 
asked to volunteer for a study involving hypnosis 

he payment, the time involved, and the general 
structure of participation were explained, but the 
specific nature of the experiment was not stated 
Each student indicated by writing on a slip of paper 
whether or not she chose to volunteer. The group 
was told that a scheduled participation could be 
canceled and rescheduled later, but if the volunteer 
failed to appear without giving prior notice, she 
would be automatically dropped from the experi- 
ment. Fifty-four students promised to participate, 
and 22 declined. 

The potential volunteers were divided into groups 
of 26 and 28, each assigned to a_ hypnotist 
experimenter, one of whom had issued the original 
call for volunteers. Two weeks to a month later 


\ measure of dependency based on content 


ON PSYCHOLOGY 


t COMPARISONS OF 
UNTEERS AND 
VOLUNTEERS 


AND 


EXPERIMENTS 


TABLE 1 


MEAN DIFFERENCES BETWEEN VO! 
NONVOLUNTEERS WHEN PsEUDO 


RF 
W 


INCLUDED WITH VOLUNTEER 


TH 


NONVOLUNTEERS 


P\ 
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they were scheduled for participation in an experi- 
ment during hours when they were known 
free. Each student was contacted by mail by the 
experimenter 5-7 days prior to the scheduled ap- 
pointment, and was instructed to call if the hour 
was unsuitable for any reason. 


to be 


RESULTS 


A total of 16 students, or nearly 30% of 
the potential volunteers, failed scheduled ap- 
pointments to participate. Seven had been 
contacted by one hypnotist-experimenter, and 
nine by the other. The difference between the 
experimenters in this respect is not significant 
(y? = .01). 

The overall V of 76 thus breaks down into 
subgroups of 16 pseudovolunteers (PV), 38 
participant volunteers (V), and 22 nonvolun- 
teers (NV). 

Two sets of comparisons of mean scores 
were made, using ¢. The original, potential 
volunteer group of 54, which included both 
the volunteer and pseudovolunteers groups, 
was compared with the nonvolunteer group of 
22. The pseudovolunteers were then removed 
from the volunteer group and added to the 
nonvolunteer The means for the 38 
participant volunteers were then compared 
with the means for the 38 students who did 


) 


group. 


not participate (16 pseudovolunteers plus 
nonvolunteers ). 

The ?¢’s for the various comparisons are 
listed in Table 1. 

Only two of the ¢’s for comparisons of the 
original volunteer group against the nonvolun- 
teers reach significance at the 5% level. When 
the pseudovolunteers are included with the 
and: this PV + NV group is 
compared with the participant volunteers, one 
of the two #’s is no longer significant. (The 
direction of the one, common significant dif- 
ference is the same in both sets of compari- 
However, additional ?’s 
attain the 5% level. There is, then, a total of 
eight discrepancies between the two sets of 


nonvolunteers 


SONS. ) seven now 


comparisons. 


DISCUSSION AND CONCLUSIONS 


The pseudovolunteers constituted a sizeable 
proportion of those who promised to partici- 
pate. This may be a function of the type of 
experiment. For instance, Belson (1960) re- 
ported that only 5% of the persons promising 


LUBIN., 


AND J. P. Brapy 

to participate in a BBC study of audience 
reaction failed to appear. The absence of 
difference in the pseudovolunteer percentage 
between the two hypnotist-experimenters sug- 
that perception of the experimente: 
issuing the call for volunteers was not a highly 
influential factor. 

The contrast between the sets of compari- 
sons of means is striking. When the pseudo- 
volunteers are included with the volunteers, 
only two of the 38 comparisons reach the 5% 
level. When the pseudovolunteers are pooled 
with the nonvolunteers, eight ¢’s, one of which 
had been significant in the former analysis 
now attain at least the 5% level. If we take 
binomial probability as an approximate index. 


gests 


there is better than one chance in two that 
two of 38 comparisons could be significant 
by chance alone, while the probability of 
obtaining eight of 38 by chance is beyond 
.001 (Sakoda, Cohen, & Beall, 1954).° 

It is almost gratuitous to note that a chi 
square test for related frequencies (Siegel, 
1956) yields a value of 3.13, p < .05 
tailed test), indicating that the two 
comparisons differ with respect to the num- 
ber of significant ¢’s. The important point is 
that the sets would lead to very different 


(one- 


sets of 


inferences concerning the personalities of vol- 
unteers and nonvolunteers. 

It is possible that the results of this study 
are peculiar to an hypnosis experiment, 
although the report of Martin and Marcuse 
(1958) suggests that this is not the case. Or, 
the results may be at least in part a function 
of the measuring instruments. It is certainly 
indicated, however, that the definition of a 
volunteer may have considerable import for 
the experimental outcome. It also seems logi- 
cal that a study of volunteers or volunteering 
behavior ought to be based on individuals 
who actually participate in experiments, and 


2 Binomial probability, which requires independ- 
ent data, is, of course, a crude approximation in the 
present study. Not only are the data nonindepend- 
ent in the sense of having been obtained from th 
subjects, but trait scores on forced-choice 
instruments like the EPPS and the Study of Values 
are mathematically nonindependent. However, this 
does not seriously affect the salient outcome of our 
study, namely, that qualitative inferences from the 
two sets of data analyses would necessarily be quite 
different. 


Same 
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that those who fail to keep the promise to notically induced anxiety. P 


° is . 2? is 2 
participate should not be considered as volun- 5 aa 
ade Ot rj t] . tivati ace Martin, R. M., & Marcuse, F. L. Characteri 
teers. 1erwise, the investigation concerns volunteers and nonvolunteers in psychological 
promised volunteers, not actual volunteers perimentation. J. consult. Psychol., 1958 
479 
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The Hand Skills Test, a device which measures “persistence beyond minimum 


standards on tiring tasks,” 


was used to predict school grades and job per- 


formance evaluations for higher and lower aptitude Navy personnel. 3 enlisted 


samples and 


1 officer candidate sample were employed 


Within each sample 


men were divided into higher and lower aptitude groups at the median of their 
aptitude test scores. Principal findings were: (a) the Hand Skills Test signifi- 


cantly predicted school grades of the 


lower aptitude enlisted samples (grades 


were not available for the 3rd enlisted samples) but did not predict for higher 
aptitude enlisted men or for officer candidates and (b) the Hand Skills Test 
significantly predicted job performance evaluations among lower aptitude men 


in all 4 samples, 
among the 4 


In the Navy, aptitude tests are generally 
composed of measures of verbal, arithmetic, 
and mechanical ability. In both enlisted and 
officer selection programs, cutting scores on 
these established to screen out 
potential failures, while at the same time 
providing sufficient men to fill the available 
openings. Since aptitude tests are not infallible 
predictors, it is recognized that some of the 
men rejected for a given program because of 


tests are 


low test scores might have succeeded in the 
program. An increase in the in-service pool of 
would result if with mar- 


ginally low aptitude scores who could succeed 


manpower men 
were identified by secondary selection tests.* 
The Hand Skills Test 


part of an effort 


was constructed as 
to develop noncognitive 


predictors to supplement standard aptitude 


scores 


It attempts to measure “persistence 
beyond minimum standards.” Prior studies 
(Kipnis & Glickman, 1962) found the 
Hand Skills Test was a valid predictor of 
school grades and of supervisors’ evaluations 
of job performance in the Navy. The present 
research investigates whether the Hand Skills 
Test is equally useful as a predictor among 
high and low aptitude men. 
The views expressed here do not necessarily 
represent those of the United States Navy. 
“The author wishes to thank Albert S. Glickman 
for his aid and valuable 
By marginally low 
scores which are slightly 


to enter 


Suggestions 


aptitude scores are meant 


below the test scores needed 


given program 


but again validities were not significantly different from zero 
higher aptitude samples 


The approach used in the study can be 
compared with the work of Frederiksen and 
Melville (1954) found that a test's 
validity could be increased by taking into 


who 


account its interaction with a second test or 
variable. Saunders (1956) has prpposed the 
term variable for the outside 
variable which interacts with the predictor 
variable. In terms of the present study, if it 
were found that the Hand Skills Test was more 
valid for low aptitude men than for high 
aptitude men, the most efficient use of the 
test would be to its use to the 
aptitude group where it “works best 


moderator 


restrict low 


PROCEDURI 
Sam ple § 


The data were drawn from previously published 
reports (Kipnis & Glickman, 1962; Guttman & 
Wollack, 1961). The four samples used here were 

1. Aviation Machinist Mates (AM). One 
dred and twenty third-class petty officers 
tested on the job. Evaluations of job performance 

from their immediate 
the same time as the testing was done 

2. Radiomen (RM). One hundred and thirty-five 
men, mostly seamen, were tested when they entered 
Class A Radioman School. Evaluations of job per 
formance obtained for 128 of men 14 
months later. Six aptitude test 


hun 
were 
obtained 


were supervisors at 


were these 


men missing 
scores, leaving a sample of 122 men 

3. Nuclear Power Personnel (NP). Two hundred 
forty men in several job specialties, ranging 
seamen to chief petty officers, were tested 
when they entered the United States Naval Nuclear 
School for training to 
powered submarines. Supervisors’ 


were 


and 
irom 
Power nuclear 
evaluations of job 


serve on 
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performance were obtained for 117 of these men 30 
months later 

4. Officer Candidates (OC). One hundred and eight 
OCs were tested when they entered Officer Candi- 
date School, Newport, Rhode Island. Evaluations of 
performance from shipboard officer 
supervisors approximately one year after graduation 
from Officer Candidate School 


were obtained 


The Hand Skills Test 

generated motiva- 
tion to persist beyond minimum standards on tiring 
tasks. It consists of sequentially numbered boxes in 
which examinees pencil five tally marks. The test 
rapidly promotes hand and arm fatigue and is pre- 
ented to the subjects as a measure of how rapidly 
people can use their hands and fingers. It has a one- 
minute practice session and three parts of 4 min 
utes each. A “passing score” is announced prior to 
each of the 4-minute parts. Pretesting had established 
that this score could be reached by all examinees in 
the time allowed. 

The test seeks to differentiate between those who 
stop or slow down after the passing score is reached 
and those who continue to strive. The score used 
is Number Completed in Part Three minus Number 
Completed in the Practice Session 


The test sought to measure self 


Veasures of Aptitude 


The General Classification Test (GCT) was used 
as the measure of AMs, NPs, and 
RMs. The GCT is the measure of verbal reasoning 
ability in the Navy’s Basic Test Battery and is 
used to classify enlisted men when they first 
the Navy 


aptitude for 


enter 
Scores on the Officer Qualification Test 
OQT) were used for OCs. The OQT is used to 
select officer candidates from college graduate ap 
plicants and is composed of verbal, arithmetic, and 
mechanical items, which are summed to obtain a 
single score 


easures of School erjormance 
VWeas1 f School P 


Final academic grade was used as the measure of 
school performance for RMs and OCs. For NPs, the 
eighth week grade used. This latter 
grade correlates .90 with final nuclear power grade 
at the end of 6 months and was used to permit an 
earlier start on the analysis. No school grades were 
available for the AM sample 


average was 


Measures of Job Performance 


The following measures of 
formance were used 

1. AMs. Official performance evaluations for the 
prior year were abstracted from each man’s records 
and unofficial evaluations were collected from each 
man’s supervisor. These latter evaluations consisted 
of ratings on three scales, summed to obtain a 
single score: Willingness to Work, Technical Com- 
petence, and Overall Acceptability of the Man. The 


individual job per- 


PERFORMANCI 


rABLE 1 


ATIOD 


\ptitude 


test score 


two sources of information were summed to obtal 
a single score 
2. RMs. Supervisors’ ratings on five scales, summed 
to obtain a single score: Willingness to Work, Tech- 
nical Competence Respect for Authority, Ability to 
Get Along with Shipmates, and Overall Acceptability 
of the Man 
NPs. 
Operate Equipment 
4. OCs. From Officer Fitness Reports, the a 
rating on five performance Profes 
Knowledge, Cooperation, Judgment, Leadership 


Promotion Potential 


iluations of Ability 


Supervisors’ e\ 


lactor 


Analysis 


The data were organized for analysis in the fol- 
lowing way: 

1. Based upon their Hand Skills Test scores, men 
within each sample were divided into a Low Per 
sistence Group (approximately the bottom third of 
the distribution of Hand Skill scores) and a Hizh 
Persistence Group, composed of all other men 

2. High and Low Aptitude Groups were formed 
by dividing men within each sample at the median of 
their aptitude scores.* 

3. Within each sample, two criterion 
formed upon the distribution of 
a Below Average School Group (men whos 
school grades fell in approximately the bottom third 
of the distribution) and an Average School Grou 
composed of all other men in the sample 

4. Based upon the joint distribution of the dichoto- 
mized aptitude test scores and the 
Hand Skills Test scores, a four-fold table was formed 
for each sample 

5. Next, the number of men who were categorized 
as Below Average School and the number men 
who were categorized as Average School were entered 
into each cell of the four-fold table 

The scheme used to analyze the data is sh 
Table 1. 


based 


school 


grades 


dichotomized 


4For AMs the median GCT was 48; for RMs thi 
median GCT was 54; for NPs the median GCT wa 
59; for OCs the median OQT was 51 
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rABLE 2 
RELATIONS BETWEEN THE HAND SkKILts T} 
APTITUDE LEVEI 


TAL SAMPLI 


)OL GRADES, BY 


AND I 


Correlations between the 
Hand Skills Test and 
school grades 


High 
RM 
NP 


At each level of aptitude, phi coefficients were 
computed between the dichotomized Hand Skills 
Test scores and the number of men categorized as 
Below Average School and as Average School. In 
iddition, overall phi correlations between the Hand 
Skills Test and the school criterion were computed, 
ignoring level of aptitude. The level of significance 
of the phi’s were evaluated by chi square test, with 
one degree of freedom. 
criterion were also 
sample, based upon the distribution of 

evaluation scores. These groups 
consisted of a Average Work Group (men 
whose criterion scores fell in approximately the bot 
tom third of the distribution) and an Average Work 
Group, composed of all other men. 

8. The analysis used for school grades 
with the job performance criterion, with the 
cell entries being the number of men categorized as 
Average Work and as Average Work. 


Two job groups formed 
within eacl 
ob performance 


Below 


was re- 


peated 
Below 


Restriction in Criterion Variance 

Since OQT and GCT scores correlated with school 
grades among OCs and NPs, it was not possible in 
these two samples to categorize equal numbers of 
High and Low Aptitude men as Below Average 
School. Only 17% of the High Aptitude OCs and 
“% of the High Aptitude NPs were classified as 
Average School, compared to 48% of the 
Aptitude OCs and 40% of the Aptitude 
On the other hand, relatively equal numbers of 


Below 


Low 
NPs 


Low 


High and Low Aptitude RMs were categorized as 
Below Average School 

There were no statistically significant correlations 
between either GCT or OQT scores and job per- 
formance evaluations. Consequently relatively equal 
numbers of High and Low Aptitude men within 
each sample could be placed in the Below Average 
Work 


criterion groups 


Restriction in Hand Skills Variance 
Within 


significant 


there were no statistically 
differences in average Hand Skill Test 
between High and Low Aptitude Groups 
Similarly, F tests yielded no statistically significant 
differences in Hand Skills score variability between 
High and Low Aptitude Groups within each sample 
There was however, a consistent trend in each sam- 
ple for the variance of Low Aptitude Groups’ Hand 
Skills scores to be smaller than the variance of High 
Aptitude Groups’ 


each sample, 


scores 


scores 


RESULTS 


The results for the three samples are given 
in Table 2 which shows the phi correlations 
between the Hand Skills Test and school 

TABLE 3 
Put CORRELATIONS BETWEEN THE HAN 
\ND PERFORMANCE EVALt 


ATIONS, 


LEVEL AND FOR Eacu 


Hand Skills Test 
performance 


evaluations 


P} 
nl 


* Significant beyond the .05 level. 
** Significant beyond the .01 level 
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grades for each aptitude level and for the 
total sample. 

For the RM and NP samples, the Hand 
Skills Test was significantly related to school 
grades among men with low aptitude scores, 
but not among men with high aptitude scores. 
There was no prediction of 
grades among either high or low aptitude OCs. 

The phi’s for the two low aptitude enlisted 
samples were from .04 to .13 points higher 
than the phi’s obtained when high and low 
aptitude men were combined in these 


officer school 


two 
samples 

Table 3 gives the phi correlations between 
the Hand Skills Test and job performance 
evaluations among the four samples, by apti- 
tude level and for each total sample. 

For all four samples, the phi’s were sig- 
nificant bevond the .05 level among the Low 
\ptitude Groups. Validities ranged from .26 
to .47. Among High Aptitude Groups, on the 
other hand, none of the phi’s were signifi- 
cantly different from zero. The phi’s for the 
Low Aptitude Groups ranged from .13 to .40 
points higher than the phi’s for the High 
\ptitude Groups, and from 
higher t} 


> to 


.20 points 


an the phi’s for the total samples. 


DISCUSSION 


At this time there seems to be little doubt 
that the validity of tests may be moderated 
by their interaction with some test 
variable. Studies of Frederiksen and Gilbert 
(1960), Frederiksen and Melville (1954), 
and Ghiselli (1956, 1960a, 1960b) report 
striking improvements in test validities after 
eliminating ‘nonpredictable” individuals on 
the basis of some third test score. The results 
of this study further support these findings. 
In the NP and RM samples, the Hand Skills 
Test identified men with lower GCT scores 
who were more likely to succeed at school. In 
all four samples, the Hand Skills Test identi- 


second 


fied with fair to considerably accuracy the 
lower aptitude men who succeeded on the job. 


On the other hand, no test validity was 
obtained among higher aptitude men in any 
of the samples, either at school or on the job. 

Assuming construct validity for the Hand 
Skills Test, it would appear that a willingness 
to persist beyond minimum standards on 
irl moderately related to the 


asks is 
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school performance, and strongly related to 
the job performance of lower aptitude men, 
but not of higher aptitude men. Why the 
same persistence does not also elevate below 
average performance of higher aptitude men 
is not clear. 

One possible explanation for these differ- 
ences may originate in the fact that the Navy, 
like most large institutions, screens individuals 
for any particular school and job to ensure 
that they have the necessary aptitudes to do 
the work. Of course, it is not expected that 
all selected individuals will do equally good 
work. will do outstanding work and 
others will barely get by. However, on the 
average, each selected group’s mental level 
will tend to be appropriate for the work they 
are assigned. One further point to be noted is 
that within any selected group, the higher the 
aptitude level of the individual, the more 
likely he is to grasp the requirements of the 


some 


job. 

Under these conditions, it may be seen that 
higher aptitude men have some advantages 
over the lower aptitude men with whom they 
are competing. For one thing, it is easier for 
higher aptitude men than for lower aptitude 
men to reach any given proficiency level. For 
example, a student with a high IQ has to 
put forth less effort to obtain a C 
than does a student whose IQ barely exceeded 


average 


the entrance requirements for the school. Since 
the proficiency level used as criterion of per- 
formance in this study was relatively modest 
(bottom third vs. upper two thirds), it seems 
reasonable to believe that higher aptitude men 
did not have to work too hard to exceed this 
criterion. This then may be the reason why 
persistence did not contribute to the predic- 
tion of their performance. Other consider- 
ations, such as lack of interest, or anxiety, 
might have lead to below average performance 
among these higher aptitude men. 

On the other hand, the lower aptitude men 
in the study probably had to keep “plugging” 
away to maintain 
level. Under these circumstances persistence 
did contribute to performance, in that it kept 
their tasks until a 
satisfactory comprehension and performance 


an average performance 


lower aptitude men at 


level was achieved. 


Turing to the general problem of identifying 
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moderator variables, it seems no easy task to 
determine when such variables are operative. 
One indication may be a lack of linearity 
between the predictor and criterion. This situ- 
ation has been noted for the Hand Skills Test 
(Kipnis & Glickman, 1962). However, even 
if one suspects the influence of a moderator 
variable, it is not easy to determine the spe- 
cific test acting as moderator. At this time it 
is practically a matter of clinical “hunch” 
and empirical “hunt.” For example, this in- 
vestigation grew out of a general interest in 
how intelligence and achievement might in- 
teract with the Hand Skills Test. A first start 
involved testing whether school grades moder- 
ated the Hand Skills Test’s prediction of job 
performance, using the RM, NP, and OC 
samples. The results were highly suggestive of 
a moderator effect, but were not consistent 
over all three samples. GCT scores were then 
tried for the two enlisted samples and OQT 
scores for the OC sample, using both school 
grades and job evaluations as criteria, with 
the results reported in this study. Subse- 
quently, the job performance results for the 
two enlisted samples were cross-validated with 
the AD sample. 

While it is easy to find reasons for the dif- 
ferences in results obtained by using two 
different, but related, moderator variables, 
(e.g., school grades are more complex than 
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GCT scores), still it can be seen that separate 
testing of each potential moderator is neces- 
sary. At this time there are no statistics, 
analogous to multiple regression techniques, 
to mechanically pick out moderator variables 
from among a matrix of test intercorrelations. 
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